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Title of the Invention 



METHOD FOR INTEGRATING GENES AT SPECIFIC SITES IN MAMMALIAN CELLS VIA 
HOMOLOGOUS RECOMBINATION AND VECTORS FOR ACCOMPLISHING THE SAME 



Field of the Invention 

The present invention relates to a process of tar- 
geting the integration of a desired exogenous DNA to a 
specific location within the genome of a marranalian cell. 

10 More specifically, the invention describes a novel methr 
od for identifying a transcriptionally active target 
site ("hot spot^*) in the mammalian genome, and inserting 
a desired DNA at this site via homologous recombination. 
The invention also optionally provides the ability for 

15 gene amplification of the desired DNA at this location 
by CO- integrating an amplifiable selectable marker, 
e.g., DHFR, in combination with the exogenous DNA. The 
invention additionally describes the construction of 
novel vectors suitable for accomplishing the above, and 

20 further provides mammalian cell lines produced by such 

methods which contain a desired exogenous DNA integrated 
at a target hot spot. 
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Background 

Technology for es^ressing recombinant proteins in 
both prokairyotic and eukaryotic organisms is well estab- 
lished. Mammalian cells offer significant advantages 
5 over bacteria or yeast for protein production, resulting, 
from their ability to correctly assemble, glycosylate 
and post-translationally modify recombinant ly expressed 
proteins. After transfection into the host cells, 
recombinant expression constructs can be maintained as 

10 extrachromosomal elements, or may be integrated into the 
host cell genome. Generation of stably transfected 
mammalian cell lines usually involves the latter; a -DNA 
construct encoding a gene of interest along with a drug 
resistance gene (dominant selectable marker) is intro- 

15 duced into the host cell, and subsequent growth in the 
presence of the dirug allows for the selection of cells 
that have successfully integrated the exogenous DNA. In 
many instances, the gene of interest is linked to a drug 
resistant selectable marker which can later be subjected 

20 to gene amplification. The gene encoding dihydrof olate 
reductase (DHFR) is most commonly used for this purpose. 
Growth of cells in the presence of methotrexate, a com- 
petitive inhibitor of DHFR, leads to increased DHFR 
production by means of amplification of the DHFR gene. 

25 As flanking regions of DNA will also become amplified, 
the resultant coamplif ication of a DHFR linked gene in 
the transfected cell line can lead to increased protein 
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10 



15 



20 



25 



production, thereby resulting in high level expression 
. of the gene of interest. 

While this approach has proven successful, there 
are a number of problems with the system because of the 
random nature of the integration event. These problems 
exist because expression levels are greatly influenced 
by the effects of the local genetic environment at the 
gene locus, a phenomena well documented in the litera- 
ture and generally referred to as "position effects" 
(for example, see Al-Shawi et al, Mol. Cell. Biol,, 
.10:1192-1198 (1990); Yoshimura et al, Mol. Cell. Biol., 
7:1296-1299 (1987):). As the vast majority of mammalian 
DNA is in a transcriptionally inactive state, random 
integration methods offer no control over the 
transcriptional fate of the integrated DNA. 
Consequently, wide variations in the expression level 
of integrated genes can occur, depending on the site of 
integration. For example, integration of exogenous DNA 
into inactive, or transcriptionally "silent" regions of 
the genome will result in little or no expression. By 
contrast integration into a transcriptionally active 
site may result in high expression. 

Therefore, when the goal of the work is to obtain a 
high level of gene expression, as is typically the de- 
sired outcome of genetic engineering methods, it is 
generally necessary to screen large numbers of transfec- 
tants to find such a high producing clone. 
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Additionally, random integration of exogenous DNA into 
the genome can in some instances disrupt important 
cellular genes, resulting in an altered phenotype. 
These factors can make the generation of high expressing 
5 stable mammalian cell lines a complicated and laborious 
process. 

Recently, our laboratory has described the use of 
DNA vectors containing translationally impaired dominant 
selectable markers in mammalian gene expression. (This 

10 is disclosed in U.S- Serial No. 08/147,696 filed Novem- 
ber 3, 1993, recently allowed) . 

These vectors contain a translationally impaired 
neomycin phosphotransferase (neo) gene as the dominant 
selectable marker, artificially engineered to contain an 

15 intron into which a DHFR gene along with a gene or genes 
of interest is inserted. Use of these vectors as ex- 
pression constructs has been found to significantly 
reduce the total number of drug resistant colonies pro- 
duced, thereby facilitating the screening procedure in 

20 relation to conventional mammalian expression vectors. 
Furthermore, a significant percentage of the clones 
obtained using this system are high expressing clones. 
These results are apparently attributable to the 
modifications made to the neo selectable marker. Due to 

25 the translational impairment of the neo gene, 

transfected cells will not produce enough neo protein to 
survive drug selection, thereby decreasing the overall 
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number of drug resistant colonies. Additionally, a 
higher percentage of the surviving clones will contain 
the expression vector integrated into sites in the 
genome where basal transcription levels are high, 
5 resulting in overproduction of neo, thereby allowing the 
cells to overcome the ittpairment of the neo gene. 
Concomitantly, the genes of interest linked to neo will 
be subject to similar elevated levels of transcription. 
This same advantage is also true as' a result of the 

10 artificial intron created within neo; survival is 

dependent on the synthesis of a functional neo gene, 
which is in turn dependent on correct and efficient 
splicing of the neo introns. Moreover, these criteria 
are more likely to be met if the vector DNA has 

15 integrated into a region which is already highly 
transcriptionally act ive • / 

Following integration of the vector into a tran- 
scriptionally active region, gene amplification is per- 
formed by selection for the DHFR gene. Using this sys- 

20 tern, it has been possible to obtain clones selected 

using low levels of methotrexate (50nM) , containing few 
(<10) copies of the vector which secrete high levels of 
protein (>55pg/cell/day) . Furthermore, this can be 
achieved in a relatively short period of time. However, 

25 the success in amplification is variable. Some 

transcriptionally active sites cannot be amplified and 
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therefore the frequency and extent of amplification from 
a particular site is not predictable. 

Overall, the use of these translationally impaired 
vectors represents a significant improvement over other 
5 methods of random integration. However, as discussed, 

the problem of lack of control over the integration site 
remains a significant concern. 

One approach to overcome the problems of random 
integration is by means of gene targeting, whereby the 

10 exogenous DNA is directed to a specific locus within the 
host genome. The exogenous DNA is inserted by means of 
homologous recombination occurring between sequences of 
* DNA in the expression vector and the corresponding ho- 
mologous sequence in the genome. However, while this 

15 type of recombination occurs at a high frequency natu- 
rally in yeast and other fungal organisms, in higher 
eukaryotic organisms it is an extremely rare event. In 
mammalian cells, the frequency of homologous versus non- 
homologous (random integration) recombination is report - 

20 ed to range from 1/100 to 1/5000 (for example, see 
Capecchi, Science, 244:1288-1292 (1989); Morrow and 
Kucherlapati, Curr. Op. Biotech., 4:577-582 (1993)). 

One of the earliest reports describing homologous 
recombination in mammalian cells comprised an artificial 

25 system created in mouse fibroblasts (Thomas et al, Cell, 
44:419-428 (1986)). A cell line containing a mutated, 
non- functional version of the neo gene integrated into 
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the host genome was created^ and subsequently targeted 
with a second non- functional copy of neo containing a 
different mutation. Reconstruction of a fxinctional neo 
gene could occur only by gene targeting. Homologous 
5 recombinants were identified by selecting for G418 

resistant cells, and confirmed by analysis of genomic. 
DNA isolated from the resistant clones. 

Recently, the use of homologous recombination to 
replace the heavy and light immunoglobulin genes at 

.10 endogenous loci in antibody secreting cells has been 
reported. (U. S ... Patent No. 5,202,238, Fell et al, . 
(1993).) However, this particular approaches not 
widely applicable, because it is limited to the 
production of immunoglobulins in cells which 

15 endogenously express immunoglobulins, e.g., B cells and 
myeloma cells. Also, expression is limited to single 
copy gene levels because co-amplification after 
homologous recombination is not included. The method is 
further complicated by the fact that two separate 

20 integration events are required to produce a functional 
immunoglobulin: one for the light chain gene followed by 
one for the heavy chain gene. 

An additional exanple of this type of system has 
been reported in NS/0 cells, where recombinant 

25 immunoglobulins are expressed by homologous 

recombination into the immunoglobulin gamma 2A locus 
(Hollis et al, international patent application # 
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PCT/IB95 (00014).) Expression levels obtained from this 
site were extremely high - on the order of 20pg/cell/day 
from a single copy integrant. However, as in the above 
example, expression is limited to this level because an 
5 amplifiable gene is not contegrated in this system. 
Also, other researchers have reported aberrant 
glycosylation of recombinant proteins expressed in NS/0 
cells (for example, see Flesher et al, Biotech, and 
Bioeng., 48:399-407 (1995)), thereby limiting the 

10 applicability of this approach. 

The cre-loxP recombination system from 
bacteriophage PI has recently been adapted and used as a ^ 
means of gene targeting in eukaryotic cells: 
Specifically, the site specific integration of exogenous 

15 DNA into the Chinese hamster ovary (CHO) cell genome 
using ere recombinase and a series of lox containing 
vectors have been described. (Fukushige and Sauer, 
Proc. Natl. Acad. Sci. USA, 89:7905-7909 (1992).) This 
system is attractive in that it provides for 

20 reproducible expression at the same chromosomal 

location- However, no effort was made to identify a 
chromosomal site from which gene expression is optimal, 
and as in the above example, expression is limited to 
single copy levels in this system. Also, it is 

25 complicated by the fact that one needs to provide for 
expression of a functional recombinase enzyme in the 
mammalian cell . 
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The use of homologous recombination between an 
introduced DNA sequence and its endogenous chromosomal 
locus has also been reported to provide a useful means 
of genetic manipulation in mammalian cells, as well as 

: 5 in yeast cells. (See e.g., Bradley et al, Meth. . 
Enzymol., 223:855-879 (1993); Capecchi, Science, 
244:1288-1292 (1989); Rothstein et al. Math. Enzymol,, 
194:281-301 (1991)). To date, most mammalian gene 
targeting studies have been directed toward gene 

10 disruption ("knockout") or site-specific mutagenesis of 
selected target gene loci in mouse embryonic stem (ES) 
cells. The creation of these "knockout" mouse models 
has enabled scientists to examine specific 
structure -function issues and examine the biological , 

15 importance of a myriad of mouse genes. This field of 
research also has important implications in terms of 
potential gene therapy applications. 

Also, vectors have recently been reported by Cell- 
tech (Kent, U.K.) which purportedly are targeted to 

20 transcriptionally active sites in NSO cells, which do 
not require gene amplification (Peakman et al. Hum. 
An tibod . Hybri domas , 5:65-74 (1994)). However , 1 evel s 
of immunoglobulin secretion in these unamplified cells 
have not been reported to exceed 20pg/cell/day, while in 

25 amplified CHO cells, levels as high as lOOpg/cell/day 
can be obtained ( Id. ) . 
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It would be highly desirable to develop a gene 
targeting system which reproducibly provided for the 
integration of exogenous DNA into a predetermined site 
in the genome known to be transcriptionally active, 
5 Also, it would be desirable if such a gene targeting 

system would further facilitate co-amplif ication of the 
inserted DNA after integration. The design of such a 
system would allow for the reproducible and high level 
expression of any cloned gene of interest in a mammalian 

10 cell, and undoxibtedly would be of significant interest 
to many researchers. 

In this application, we provide a novel mammalian- 
expression system, based on homologous recombination 
occurring between two artificial substrates contained in 

15 two different vectors. Specifically, this system uses a 
combination of two novel mammalian expression vectors, 
referred to as a "marking" vector and a "targeting" 
vector . 

Essentially, the marking vector enables the identi- 
20 fication and marking of a site in the mammalian genome 

which is transcriptionally active, i.e., a site at which 
gene e:q)ression levels are high. This site can be 
regarded as a "hot spot" in the genome. After integra- 
tion of the marking vector, the subject expression sys- 
25 tem enables another DNA to be integrated at this site, 
i.e., the targeting vector, by means of homologous 
recombination occurring between DNA sequences common to 
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both vectors. This system affords significant 
advantages over other homologous recombination systems. 

Unlike most other homologous systems employed in 
mammalian cells ^ this system exhibits no background. 
5 Therefore, cells which have only xindergone random: inte- 
gration of the vector do not survive the selection. 
Thus, any gene of interest cloned into the targeting 
plasmid is expressed at high levels from the marked hot' 
spot. Accordingly, the subject method of gene expres- 
.10 sion substantially or completely eliminates the problems 
inherent to systemis of random integration, discussed in. 
detail above. Moreover, this system provides reproduc- 
ible and high level expression of any recombinant ' pro- 
tein at the same transcriptionally active site in the 
15 mammalian genome. In addition, gene amplification may 
be effected at this particular transcriptionally active 
site by including an amplifiable dominant selectable 
marker (e.g. DHFR) as part of the marking vector. 

Objects of the Invention 

20 Thus, it is an object of the invention to provide 

an inproved method for targeting a desired DNA to a 
specific site in a mammalian cell. 

It is a more specific object of the invention to 
provide a novel method for targeting a desired DNA to a 

25 specific site in a mammalian cell via homologous recom- 
bination. 
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It is another specific object of the invention to 
provide novel vectors for achieving site specific inte- 
gration of a desired DNA. in a mammalian cell. 

It is still another object of the invention to 
5 provide novel mammalian cell lines which contain a de- 
sired DNA integrated at a predetermined site which pro- 
vides for high expression. 

It is a more specific object of the invention to 
provide a novel method for achieving site specific inte- 
10 gration of a desired DNA in a Chinese hamster ovary 
(CHO) cell. 

It is another more specific object of the invention 
to provide a novel method for integrating immunoglobulin 
genes, or any other genes, in mammalian cells at : 
15 predetermined chromosomal sites that provide for high 
expression. 

It is another specific object of the invention to 
provide novel vectors and vector combinations suitable 
for integrating immunoglobulin genes into mammalian 
20 cells at predetermined sites that provide for high ex- 
pression. 

It is another object of the invention to provide 
mammalian cell lines which contain immunoglobulin genes 
integrated at predetermined sites that provide for high 
25 expression. 

It is an even more specific object of the invention 
to provide a novel method for integrating immunoglobulin 
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genes into CHO cells that provide for high expression, 
as well as novel vectors and vector combinations that 
provide for such integration of immunoglobulin genes 
into CHO cells. 

5 In addition, it is a specific object of the inven- 

tion to provide novel CHO cell lines which contain immu- 
noglobulin genes integrated at predetermined sites that 
provide for high expression, and have been amplified by 
methotrexate selection to secrete even greater amounts 
10 of functional immunoglobulins. 

Brief Descriptin n of the Ficfurf»fi 

Figure 1 depicts a map of a marking plasmid accord- 
ing to the invention referred to as Desmond. The plas- 
mid is shown in circular form (la) as well as a 
15 linearized version used for transfection (lb) . 

Figure 2(a) shows a map of a targeting plasmid 
referred to "Molly". Molly is shown here encoding the 
anti-CD20 immunoglobulin genes, expression of which is 
described in Example 1. 
20 Figure 2(b) shows a linearized version of. Molly, 

after digestion with the restriction enzymes Kpnl and 
Pad. This linearized form was used for transfection. 

Figure 3 depicts the potential alignment between 
Desmond sequences integrated into the CHO genome, and 
25 incoming targeting Molly sequences. One potential ar- 
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rangement of Molly integrated into Desmond after homolo- 
gous recombination is also presented. 

Figure 4 shows a Southern analysis of single copy 
Desmond clones. Samples are as follows: 
5 Lane 1: XHindlll DNA size marker 
:. Lane 2 : Desmond clone 10F3 

Lane 3 : Desmond clone 10C12 
Lane 4 : Desmond clone 15C9 
Lane 5 : Desmond clone 14B5 
10 Lane 6 : Desmond clone 9B2 

Figure 5 shows a Northern analysis of single copy 
Desmond clones. Samples are as follows: Panel A: 
northern probed with CAD and DHFR probes, as indicated 
on the figure. Panel B: duplicate northern, probed with 
15 CAD and Hi sD probes, as indicated. The RNA samples 
loaded in panels A and B are as follows: 
Lane 1: clone 9B2, lane 2; clone 10C12, lane 3; clone 
14B5, lane 4; clone 15C9, lane 5; control RNA from CHO 
transfected with a HisD and DHFR containing plasmid, 
20 lane 6; untransf ected CHO. 

Figure 6 shows a Southern analysis of clones 
resulting from the homologous integration of Molly into 
Desmond. Samples are as follows: 

Lane 1: XHindlll DNA size markers. Lane 2: 20F4, lane 3; 
25 5F9, lane 4; 21C7, lane 5; 24G2, lane 6; 25E1, lane 7; 
28C9, lane 8; 29F9, lane 9; 39G11, lane 10; 42F9, lane 
11; 50G10, lane 12; Molly plasmid DNA, linearized with 
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BgllKtop band) and cut with Bglll and Kpnl (lower 
band), lane 13; untrans.f ected Desmond. 

Figures 7A through 7G contain the Sequence Listing 
for Desmond. 

5 Figures 8A through 81 contain the Sequence Listing 

for Molly- containing anti-CD20. 

Figure 9 contains a map of the targeting plasmid, 
"Mandy, " shown here encoding anti-CD23 genes, the 
expression of which is disclosed in Example 5. 
10 Figures lOA through ION contain the sequence 

listing of "Mandy" containing the anti-CD23 genes ks 
disclosed in Example 5. . 

Detailed Desri-ipMon of i-h e InvpnUon 

The invention provides a novel method for integrat- 
ing a desired exogenous DNA at a target site within the 
genome of a mammalian cell via homologous recombination. 
Also, the invention provides novel vectors for achieving 
the site specific integration of a DNA at a target site 
in the genome of a mammalian cell. 

More specifically, the subject cloning method pro- 
vides for site specific integration of a desired DNA in 
a mammalian cell by transfection of such cell with a 
"marker plasmid" which contains a unique sequence that 
is foreign to the mammalian cell genome and which 
provides a substrate for homologous recombination, fol- 
lowed by transfection with a "target plasmid" containing 



15 



20 



25 
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a sequence which provides for homologous recombination 
with the unique sequence contained in the marker 
plasmid, and further comprising a desired DNA that is to 
be integrated into the mammalian cell. Typically, the 
5 integrated DNA will encode a protein of interest, such 
as an immunoglobulin or other secreted mammalian 
glycoprotein. 

The exemplified homologous recombination system 
uses the neomycin phosphotransferase gene as a dominant : 
10 selectable marker. This particular marker was utilized 
based on the following previously published obsei-va- 
tions; 

(i) the demonstrated ability to target and restore 
function to a mutated version of the neo gene (cited 

15 earlier) and 

(ii) our development of translationally impaired 
expression vectors, in which the neo gene has been arti- 
ficially created as two exons with a gene of interest 
inserted in the intervening intron; neo exons are cor- 

20 rectly spliced and translated in vivo, producing a func- 
tional protein and thereby conferring G418 resistance on 
the resultant cell population. In this application, the 
neo gene is split into three exons. The third exon of 
neo is present on. the "marker" plasmid and becomes irite- 

25 grated into the host cell genome upon integration of the 
marker plasmid into the mammalian cells. Exons 1 and 2 
are present on the targeting plasmid, and are separated 
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by an intervening intron into which at least one gene of 
interest is cloned. Homologous recombination of the 
targeting vector with the integrated marking vector 
results. in correct splicing of all three exons of the 
neo gene and thereby egression of a functional neo 
protein (as determined by selection for G418 resistant 
colonies) . Prior to designing the current e^qjression 
system, we had experimentally tested the functionality 
of such a triply spliced neo construct in mammalian 
cells. The results of this control esqperiment indicated 
that all three neo exons were properly spliced and 
therefore suggested the feasibility of the subject 
invention. 

However, while the present invention is exemplified 
using the neo gene, and more specifically a triple split 
neo gene, the general methodology should be efficacious 
with other dominant selectable markers. 

As discussed in greater detail infra, the present 
invention affords numerous advantages to conventional 
gene expression methods, . including both random integra- 
tion and gene targeting methods. Specifically, the 
subject invention provides a method which reproducibly 
allows for site-specific integration of a desired DNA 
into a transcriptionally active domain of a mammalian 
cell. Moreover, because the subject method introduces 
an artificial region of "homology" which acts as a 
unique substrate for homologous recombination and the 
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insertion of a desired DNA, the efficacy of subject 
invention does not require that the cell endogenous ly 
contain or express a specific DNA. Thus, the method is 
generically applicable to all mammalian cells, and can 
5 be used to express any type of recombinant protein. 

The use of a triply spliced selectable marker,, 
e.g., the exemplified triply spliced neo construct, 
guarantees that all G418 resistant colonies produced 
will arise from a homologous recombination event (random 

10 integrants will not produce a functional neo gene and 
consequently will not survive G418 selection). Thus, 
the subject invention makes it easy to screen for the 
desired homologous event. Furthermore, the frequency of 
additional random integrations in a cell that has under- 

15 gone a homologous recombination event appears to be low. 

Based on the foregoing, it is apparent that a sig- 
nificant advantage of the invention is that it substan- 
tially reduces the number of colonies that need be 
screened to identify high producer clones, i.e.,- cell 

20 lines containing a desired DNA which secrete the corre- 
sponding protein at high levels. On average, clones 
containing integrated desired DNA may be identified by 
screening about 5 to 20 colonies (compared to several 
thousand which must be screened when using standard 

25 random integration techniques, or several hundred using 
the previously described intronic insertion vectors) 
Additionally, as the site of integration was preselected 
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and comprises a transcriptionally active domain, all 
exogenous DNA eo^ressed at this site should produce 
comparable, i.e. high levels of the protein of interest. 

Moreover, the subject invention is further advanta- 
geous in that it enables an amplifiable gene to be 
inserted on integration of the marking vector. Thus, 
when a desired gene is targeted to this site via 
homologous recombination, the subject invention allows 
for expression of the gene to be further enhanced by 
gene amplification. In this regard, it has been 
reported in from the literature that different genomic ■ 
sites have different capacities for gene amplification 
(Meinkoth et al, Mol. Cell Biol., 7:1415-1424 (1987)). 
Therefore, this technique is further advantageous as it 
15 allows for the placement of a desired gene of interest 

at a specific site that is both transcriptionally active 
and easily amplified. Therefore, this should signifi- 
cantly reduce the amount of time required to isolate 
such high producers . 
20 Specifically, while conventional methods for the 

construction of high expressing mammalian cell lines can 
take 6 to 9 months, the present invention allows for 
such clones to be isolated on average after only about 
3-6 months. This is due to the fact that conventionally 
25 isolated clones typically must be subjected to at least 
three rounds of drug resistant gene amplification in 
order to reach satisfactory levels of gene expression. 
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As the homologously produced clones are generated from a 
preselected site which is a high e3<pression site, fewer 
rounds of amplification should be required before reach- 
ing a satisfactory level of production. 
5 Still further, the subject invention enables the 

reproducible selection of high producer clones wherein 
the vector is integrated at low copy number, typically 
single copy. This is advantageous as it enhances the 
stability of the clones and avoids other potential ad- 
10 verse side-effects associated with high copy number. As 
described supra, the subject homologous recombination 
system uses the combination of a "marker plasmid" and a 
"targeting plasmid" which are described in more detail 
below. 

15 The "marker plasmid" which is used to mark and 

identify a transcriptionally hot spot will comprise at 
least the following sequences: 

(i) a region of DNA that is heterologous or unique 
to the genome of the mammalian cell, which functions as 

20 a source of homology, allows for homologous recombina- 
tion (with a DNA contained in a second target plasmid) . 
More specifically, the unique region of DNA (i) will 
generally comprise a bacterial, viral, yeast synthetic, 
or other DNA which is not normally present in the 

25 mammalian cell genome and which further does not 

comprise significant homology or sequence identity to 
DNA contained in the genome of the mammalian cell . 
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Essentially, this sequence should be sufficiently 
different to mammalian DNA that it will not signifi- 
cantly recombine with the host cell genome via 
homologous recombination. The size of such unique DNA 
5 will generally be at least about 2 to 10 kilobases in 

size, or higher, more preferably at least about lOkb, as 
several other investigators have noted an increased 
frequency of targeted recombination as the size of the : 
homology region is increased (Capecchi, Science, 

,10 244:1288-1292 (1989)). 

The upper size limit of the unique DNA which acts 
as a site for homologous recombination with a sequence 
in the second target vector is largely dictated by po- 
tential stability constraints (if DNA is too large it 

15 may not be easily integrated into a chromosome and the 
difficulties in working with very large DNAs. 

(ii) a DNA including a fragment of a selectable 
marker DNA, typically an exon of a dominant selectable 
marker gene. The only essential feature of this DNA is 

20 that it not encode a functional selectable marker pro- 
tein unless it is expressed in association with a se- 
quence contained in the target plasmid. Typically, the 
target plasmid will comprise the remaining exons of the 
dominant selectable marker gene (those not comprised in 

25 "targeting" plasmid) . Essentially, a functional 

selectable marker should only be produced if homologous 
recombination occurs (resulting in the association and 
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expression of this marker DNA (i) sequence together with 
the portion (s) of the selectable marker DNA fragment 
which is (are) contained in the target plasmid) . 

As noted, the current invention exemplifies the 
5 use of the neomycin phosphotransferase gene as the domi- 
nant selectable marker which is "split" in the two vec- 
tors. However, other selectable markers should also be 
suitable, e,g, , the Salmonella histidinol dehydrogenase 
gene, hygromycin phosphotransferase gene, herpes simplex 

10 virus thymidine kinase gene, adenosine deaminase gene, , 
glutaminie synthetase gene and hypoxanthine -guanine 
phosphoribosyl transferase gene. 

(iii) a DNA which encodes a functional selectable 
marker protein, which selectable marker is different 

15 from the selectable marker DNA (ii) . This selectable 

marker provides for the successful selection of mammali- 
an cells wherein the marker plasmid is successfully 
integrated into the cellular DNA. More preferably, it 
is desirable that the marker plasmid comprise two such 

20 dominant selectable marker DNAs, situated at opposite 

ends of the vector. This is advantageous as it enables, 
integrants to be selected using different selection 
agents and further enables cells which contain the en- 
tire vector to be selected. Additionally, one marker 

25 can be an amplifiable marker to facilitate gene 

amplification as discussed previously. Any of the 
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dominant selectadDle marker listed in (ii) can be used as 
well as others generally known in the art. 

Moreover, the marker plasmid may optionally further 
comprise a rare endonuclease restriction site. This is 
5 potentially desirable as this may facilitate cleavage. 
If present, such rare restriction site should be situat- 
ed close to the middle of the unique region that acts , as 
a substrate for homologous recombination. Preferably 
such sequence will be at least about 12 nucleotides. 

10 The introduction of a double stranded break by similar 
methodology has been reported to enhance the frequency 
of homologous recombination. (Choulika et al, Mol. 
Cell. Biol., 15:1968-1973 (1995)). However, the 
presence of such sequence is not essential. 

15 The "targeting plasmid" will comprise at least the 

following sequences: 

(1) the same unique region of DNA contained in the 
marker plasmid or one having sufficient homology or 
sequence identity therewith that said DNA is capable of 

20 combining via homologous recombination with the unique 
region (i) in the marker plasmid. Suitable types of 
DNAs are described supra in the description of the 
xinique region of DNA (1) in the marker plasmid. 

(2) The remaining exons of the dominant selectable 
25 marker, one exon of which is included as (ii) in the 

marker plasmid listed above. The essential features of 
this DNA fragment is that it result in a fimctional 
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(selectable) marker protein only if the target plasmid 
integrates via homologous recombination (wherein such 
recombination results in the association of this DNA 
with the other fragment of the selectable marker DNA 
5 contained in the marker plasmid) and further that it 
allow for insertion of a desired exogenous DNA. Typi- 
cally, this DNA will comprise the remaining exons of the 
selectable marker DNA which are separated by an intron. 
For example, this DNA may comprise the first two exons 
10 of the neo gene and the marker plasmid may comprise the 
third exon (back third of neo) . 

(3) The target plasmid will also comprise a de- 
sired DNA, e.g., one encoding a desired polypeptide, 
preferably inserted within the selectable marker DNA 
15 fragment contained in the plasmid. Typically, the DNA 

will be inserted in an intr:on which is comprised between 
the exons of the selectable marker DNA. This ensures 
that the desired DNA is also integrated if homologous 
recombination of the target plasmid and the marker plas- 
20 mid pccurs. This intron may be naturally occurring or 

it may be engineered into the dominant selectable marker 
DNA fragment. 

This DNA will encode any desired protein, 
preferably one having pharmaceutical or other desirable 
25 properties. Most typically the DNA will encode a 

mammalian protein, and in the current examples provided, 
an immunoglobulin or an immunoadhesin. However the 
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invention is not in any way limited to the production of 
immunoglobulins . 

As discussed previously, the stibject cloning method 
is suitable for any mammalian cell as it does not re- 
5 quire for efficacy that any specific mammalian sequence 
or sequences be present. In general, such mammalian 
cells will comprise those typically used for protein 
expression, e.g., CHO cells, myeloma cells, COS cells, 
BHK cells, Sp2/0 cells, NIH 3T3 and HeLa cells. In the 

10 examples which follow, CHO cells were utilized. The 

advantages thereof include the availability of suitable 
growth medium, their ability to grow efficiently and to 
high density in culture, and their ability to express 
mammalian proteins such as immunoglobulins in biologi- 

15 cally active form. 

Further, CHO cells were selected in large part 
because of previous usage of such cells by the inventors 
for the expression of immujioglobulins (using the trans - 
lationally impaired dominant selectable marker contain- 
20 ing vectors described previously) . Thus, the present 
laboratory has considerable experience in using such 
cells for expression. However, based on the examples 
which follow, it is reasonable to expect similar results 
will be obtained with other mammalian cells. 

general, transformation or transfection of mam- 
malian cells according to the subject invention will be 
effected according to conventional methods. So that the 
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invention may be better understood, the construction of 
exeir5>lary vectors and their usage in producing inte- 
grants is described in the exanples below. 

EXAMPLE 1 

Design and Preparation of Marker 
and Targeting Plasmid DMA Vectors 

The marker plasmid herein referred to as "Desmond" 
was assembled from the following DNA elements: 

(a) Murine dihydro folate rp 'ductase gene fnHFT^) , 
incorporated into a transcription cassette, comprising 
the mouse beta globin promoter 5" to the DHFR start 
site, and bovine growth hormone poly adenylation signal 
3" to the stop codon. The DHFR transcriptional cassette 
was isolated from TCAE6, an expression vector created 
previously in this laboratory (Newman et al, 1992, Bio- 
technology, 10:1455-1460). 

(b) E. coli B-aalactosidaaP apne^ - commercially 
available, obtained from Promega as pSV-b-galactosidase 
control vector, catalog # E1081. 

(c) Baculovirus DNA. commercially available, pur- 
chased from Clontech as pBAKPAKS, cat # 6145-1. 

(d) Cassette comprising promoter and enh a ncer 
ments from Cytomegalovir us and ■qv40 virus. The cassette 
was generated by PCR using a derivative of expression 
vector TCAE8 (Reff et al. Blood, 83:435-445 (1994)). 
The enhancer cassette was inserted within the baculo- 
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virus sequence, which was first modified by the inser- 
tion of a multiple cloning site. 

(e) E. coli GUS falucuronidag^) q«»TT°i commercially 
available, purchased from Clontech as pBlOl, cat. # 
6017-1. 

(f) Firefly lucif erase aenp. commercially avail- 
able, obtained from Promega as pGEM-Luc (catalog # 
E1541) . 

(g) S. tvphimurium histidinnl dehYf^r-oaenaFiP g c^pe^ 
(HisD) . This gene was originally a gift from (Donahue 
et el. Gene, 18:47-59 (1982)), and has subsequently been 
incorporated into a transcription cassette comprising 
the mouse beta globin major promoter 5' to the gene, and 
the SV40 polyadenylation signal 3' to the gene. 

The DNA elements described in (a) - (g) were combined 
into a pBR derived plasmid backbone to produce a 7.7kb 
contiguous stretch of DNA referred to in the attached 
figures as "homology". Homology in this sense refers to 
sequences of DNA which are not part of the mammalian 
genome and are used to promote homologous recombination 
between transfected plasmids sharing the same homology 
DNA sequences. 

(h) Neomycin phosp hotransferase gene from TN5 (Da- 
vis and Smith, Ann. Rev. Micro., 32:469-518 (1978)). 
The complete neo gene was subcloned into pBluescript 
SK- (Stratagene catalog # 212205) to facilitate genetic 
manipulation. A synthetic linker was then inserted into 
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a \inique Pstl site occurring across the codons for amino 
acid 51 and 52 of neo. This linker encoded the neces- 
sary DNA elements to create an artificial splice donor 
site, intervening intron and splice acceptor site within. 
5 the neo gene, thus creating two separate exons, present- 
ly referred to as neo exon 1 jand 2 . Neo exon 1 encodes, 
the first 51 amino acids of neo, while exon 2 encodes 
the remaining 203 amino acids plus the stop codon of the 
protein A Notl cloning site was also created within the 
10 intron. 



exons 2 and 3. This was achieved as follows: A set of 
, PGR primers were designed to amplify a region of DNA 
encoding neo exon 1, intron and the first 111 2/3 amino 

15 acids of exon2 . The 3 * PGR primer resulted in the 

introduction of a new 5' splice site immediately after 
the second nucleotide of the codon for amino acid 111 in 
exon 2, therefore generating a new smaller exon 2. The 
DNA fragment now encoding the original exon 1, intron 

20 and new exon 2 was then subcloned and propagated in a 

pBR based vector. The remainder of the original exon 2 
was used as a template for another round of PGR 
amplification, which generated "exon3". The 5* primer 
for this round of amplification introduced a new splice 

25 acceptor site at the 5 ' side of the newly created exon 
3, i.e. before the final nucleotide of the codon for 
amino acid 111. The resultant 3 exons of neo encode the 



Neo exon 2 was further subdivided to produce neo 



wo 98/41645 



PCT/US98/03935 




- 29 - 

following information: exon 1 - the first 51 amino acids 
of neo; exon 2 - the next 111 2/3 amino acids, and exon 
3 the final 91 1/3 amino acids plus the translational 
stop codon of the neo gene. 
5 Neo exon 3 was incorporated along with the above 

mentioned DNA elements into the marking plasmid 
"Desmond" . Neo exons 1 and 2 were incorporated into the 
targeting plasmid "Molly" . The Notl cloning site creat- 
ed within the intron between exons 1 and 2 was used in 

10 subsequent cloning steps to insert genes of interest 
into the targeting plasmid. 

A second targeting plasmid "Mandy" was also 
generated. This plasmid is almost identical to "Molly" 
(some restriction sites on the vector have been changed) 

15 except that the original HisD and DHFR genes contained 
in "Molly" were inactivated. These changes were 
incorporated because the Desmond cell line was no longer 
being cultured in the presence of Histidinol, therefore 
it seemed unnecessary to include a second copy of the 

20 HisD gene. Additionally, the DHFR gene was inactivated 
to ensure that only a single DHFR gene, namely the one 
present in the Desmond marked site, would be amplifiable 
in any resulting cell lines. "Mandy" was derived from 
"Molly" by the following modifications:' 

25 (i) A synthetic linker was inserted in the middle 

of the DHFR coding region. This linker created a stop 
codon and shifted the remainder of the DHFR coding 
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region out of frame, therefore rendering the gene 
nonfunctional . 

(ii) A portion of the HisD gene was deleted and 
replaced with a PGR generated HisD fragment lacking the 
5 promoter and start codoh of the gene. 

Figure 1 depicts the arrangement of these DNA ele- 
ments in the marker plasmid "Desmond" . Figure 2 depicts 
the arrangement of these elements in the first targeting 
plasmid, "Molly" . Figure 3 illustrates the possible 

10 arrangement in the CHO genome, of the various DNA 

elements after targeting and integration of Molly DNA . 
into Desmond marked CHO cells. Figure 9 depicts the 
targeting plasmid "Mandy." 

Construction of the marking and targeting plasmids . 

15 from the above listed DNA elements was carried out fol- 
lowing conventional cloning techniques (see, e.g., 
Molecular Cloning, A Laboratory Manual, J. Sambrook et 
al, 1987, Cold Spring Harbor Laboratory Press, and 
Current Protocols in Molecular Biology, F. M. Ausubel et 

20 al, eds., 1987, John Wiley and Sons). All plasmids were 
propagated and maintained in E. coli XLI blue 
(Stratagene, cat. # 200236) . Large scale plasmid 
preparations were prepared using Promega Wizard Maxiprep 
DNA Purification System®, according to the 

25 manufacturer's directions. 
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EXAMPLE 2 

Conatruction of a Marked CHO Cell Line 

1. Cell Culture and Transfection Procedures to 
Produced Marked CHO Cell Line 

5 Marker plasmid DNA was linearized by digestion 

overnight at 37°C with Bstll07I. Linearized vector was 
ethanol precipitated and resuspended in sterile TE to a 
concentration of Img/ml . Linearized vector was intro- 
duced into DHFR-Chinese hamster ovary cells (CHO cells) 

10 DG44 cells (Urlaub et al, Som. Cell and Mol. Gen. , 
12:555-566 (1986)) by elect roporat ion as follows. 

Exponentially growing cells were harvested by cen^ 
trifugation, washed once in ice cold SBS (sucrose 
buffered solution, 272mM sucrose, 7mM sodium phosphate, 

15 pH 7.4, ImM magnesium chloride) then resuspended in SBS 
to a concentration of 10^ cells/ml. After a 15 minute 
incubation on ice, 0.4ml of the cell suspension was 
mixed with 40/^g linearized DNA in a disposable 
electroporation cuvette. Cells were shocked using a BTX 

20 electrocell manipulator (San Diego, CA) set at 230 

volts, 400 microfaraday capacitance, 13 ohm resistance. 
Shocked cells were then mixed with 2 0 ml of prewairmed 
CHO growth media (CHO-S-SFMII, Gibco/BRL, catalog # 
31033-012) and plated in 96 well tissue culture plates. 

25 Forty eight hours after electroporation, plates were fed 
with selection media (in the case of transfection with 
Desmond, selection media is CHO-S-SFMII without 
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hypoxanthine or thymidine, supplemented with 2mM. 
Histidinol (Sigma catalog # H6647)). Plates were main- 
tained in selection media for up to 30 days, or until 
some of the wells exhibited cell growth. These cells 
were then removed from the 96 well plates and expanded 
ultimately to 120 ml spinner flasks where they were 
maintained in selection media at all times. 

EXAMPLE 3 

Characterization of Marked CHO Cell J.ir^^^ 

(a) Southern Analysis 

Genomic DNA was isolated from all stably growing 
Desmond marked CHO cells. DNA was isolated using the 

Invitrogen Easy® DNA kit, according to the manufactur- 
er's directions. Genomic DNA was then digested with 
Hindi I I overnight at 37°C, and subjected to Southern 
analysis .using a PGR generated digoxygenin labelled 
probe specific to the DHFR gene. Hybridizations and 
washes were carried out using Boehringer Mannheim's DIG 
easy hyb (catalog # 1603 558) and DIG Wash and Block 
Buffer Set (catalog # 1585 762) according to the manu- 
facturer's directions. DNA samples containing a single 
band hybridizing to the DHFR probe were assumed to be 
Desmond clones arising from a single cell which had 
integrated a single copy of the plasmid. These clones 
were retained for further analysis. Out of a total of 
45 HisD resistant cell lines isolated, only 5 were 
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single copy integrants. Figure 4 shows a Southern blot 
containing all 5 of these single copy Desmond clones. 
Clone names are provided in the figure legend, 
(b) Northern Analysis 

Total RNA was isolated from all single copy Desmond 
clones using TRIzol reagent (Gibco/BRL cat # 15596-026) 
according to the manufacturer's directions. 10-20;zg RNA 
from each clone was analyzed on duplicate formaldehyde 
gels. The resulting blots were probed with PGR 
generated digoxygenin labelled DNA probes to (i) DHFR 
message, (ii) HisD message and (iii) CAD message. CAD 
is a trifunctional protein involved in uridine 
biosynthesis (Wahl et al, J. Biol. Chem. , 254, 17:8679- 
8689 (1979)), and is expressed equally in all cell 
types. It is used here as an internal control to help 
quantitate RNA loading. Hybridizations and washes were 
carried out using the above mentioned Boehringer 
Mannheim reagents. The results of the Northern analysis 
are shown in Figure 5. The single copy Desmond clone 
exhibiting the highest levels of both the His D and DHFR 
message is clone 15C9, shown in lane 4 in both panels of 
the figure. This clone was designated as the "marked 
cell line" and used in future targeting experiments in 
CHO, examples of which are presented in the following 
sections. 



wo 98/41645 PCT/US98/0393S 



- 34 - 



EXAMPLE 4 

Expreaaion of Antl- CD20 Ant- -i l^ ot^y 
in Desmond Marked CHO gftl^ . ff 

C2B8, a chimeric antibody which recognizes B-cell 
surface antigen CD20, has been cloned and expressed 
previously in our laboratory. (Reff et al. Blood, 
83:434-45 (1994)). A 4.1 kb DNA fragment comprising the 
C2B8 light and heavy chain genes, along with the neces- 
sary regulatory elements (eukaryotic promoter and poly- 
adenylation signals) was inserted into the artificial 
intron created between exons 1 and 2 of the neo gene 
contained in a pBR derived cloning vector. This newly 
generated 5kb DNA fragment (comprising neo exon 1, C2B8 
and neo exon 2) was excised and used to assemble the 
targeting plasmid Molly. The other DNA elements used in 
the construction of Molly are identical to those used to 
construct the marking plasmid Desmond, identified 
previously. A complete map of Molly is shown in Fig. 2. 

The targeting vector Molly was linearized prior to 
transfection by digestion with Kpnl and Pad, ethanol 
precipitated and resuspended in sterile TE to a concen- 
tration of 1.5mg/mL. Linearized plasmid was introduced 
into exponentially growing Desmond marked cells essen- 
tially as described, except that 80^g DNA was used in 
each electroporation. Forty eight hours postelectropo- 
ration, 96 well plates were supplemented with selection 
medium - CHO-SSFMII supplemented with 400 Atg/mL Geneti- 
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cin (G418, Gibco/BRL catalog # 10131-019) . Plates were 
maintained in selection mediiim for up to 30 days, or 
until cell growth occurred in some of the wells. Such 
growth was assumed to be the result of clonal expansion 
of a single G4 18 resistant cell. The supematants from 
all G418 resistant wells were assayed for C2B8 pro- 
duction by standard ELISA techniques, and all productive 
clones were eventually e3q)anded to 120mL spinner flasks 
and further analyzed. 

Clharacterization of Antibody a ecretiner Targeted Cella 

A total of 50 eiectroporations with Molly targeting 
plasmid were carried out in this experiment, each of 
which was plated into separate 96 well plates. A total 
of 10 viable, anti-CD20 antibody secreting clones were 
obtained and expanded to 120ml spinner flasks. Genomic 
DNA was isolated from all clones, and Southern analyses 
were subsequently performed to determine whether the 
clones represented single homologous recombination 
events or whether additional random integrations had 
occurred in the same cells. The methods for DNA isola- 
tion and Southern hybridization were as described in the 
previous section. Genomic DNA was digested with EcoRI 
and probed with a PGR generated digoxygenin labelled 
probe to a segment of the CD20 heavy chain constant 
region. The results of this Southern analysis are pre- 
sented in figure 6. As can be seen in the figure, 8 of 
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the 10 clones show a single band hybridizing to the CD20 
probe, indicating a single homologous recombination 
event has occurred in these cells. Two of the ten, 
clones 24G2 and 28C9, show the presence of additional 
band(s), indicative of an additional random integration 
elsewhere in the genome. 

We examined the ejqpression levels of anti-CD20 
antibody in all ten of these clones, the data for which 
is shown in Table 1, below. 

Table 1: 

Eaqpriession Level of Anti-CD20 
Secreting Homologous Integrants 



20F4 


3.5 


25E1 


2.4 


42F9 


1.8 


39G11 


1.5 


21C7 


1.3 


50G10 


0.9 


29F9 


0.8 


5F9 


0.3 



28C9* 
24G2* 



4.5 
2.1 
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* These clones contained additional randomly 
integrated copies of anti-CD20. Expression 
levels of these clones therefore reflect a 
contribution from both the homologous and ran- 
dom sites. 

Expression levels are reported as picogram per cell per 
day (pg/c/d) secreted by the individual clones, and 
represented the mean levels obtained from three separate 
ELISAs on samples taken from 120 mL spinner flasks. 

As can be seen from the data, there is a variation 
in antibody secretion of approximately ten fold between 
the highest and lowest clones. This was somewhat unex- 
pected as we anticipated similar expression levels from 
all clones due to the fact the anti-CD20 genes are all 
integrated into the same Desmond marked site. Neverthe- 
less, this observed range in expression extremely small 
in comparison to that seen using any traditional random 
integration method or with our translationally impaired 
vector system. 

Clone 20F4, the highest producing single copy inte- 
grant was selected for further study. Table 2 (below) 
presents ELISA and cell culture data from seven day 
production runs of this clone. 
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Ted>le 2: 





7 Day 


Production 


Run Data 


for 20F4 




Day 


% viable 


viable/ml 
(x 10^) 


Tx2 (hr) 


mg/L 


pg/c/d 


1 


96 


3.4 


31 


1.3 


4.9 


2 


94 


6 


29 


2.5 


3.4 


3 


94 


9.9 


33 


4.7 


3.2 


4 


90 


17.4 


30 


6.8 


3 


5 


73 


14 




8.3 




6 


17 


3.5 




9.5 





Clone 20F4 was seeded at 2x10^1 in a 120ml spinner 
flask on day 0. On the following six days, cell counts 
were taken, doubling times calculated and 1ml samples 
of supernatant removed from the flask and analyzed for 
secreted anti-CD20 by ELISA, 



This clone is secreting on average, 3-5pg antibody/ - 
cell/day, based. on this ELISA data. This is the same 
level as obtained from other high expressing single copy 
clones obtained previously in our laboratory using the 
previously developed translationally impaired random 
integration vectors. This result indicates the follow- 
ing: 

(1) that the site in the CHO genome marked by the 
Desmond marking vector is highly transcriptionally ac- 
tive, and therefore represents an excellent site from 
which to express recombinant proteins, and 
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(2) that targeting by means of homologous recombi- 
nation can be accomplished using the subject vectors and 
occurs at a frequency high enough to make this system a 
viable and desirable alternative to random integration 
methods . 

To further demonstrate the efficacy of this system, 
we have also demonstrated that this site is amplifiable, 
resulting in even higher levels of gene expression and 
protein secretion. Amplification was achieved by plat- 
ing serial dilutions of 2 0F4 cells, starting at a densi- 
ty of 2.5 X 10^ cells/ml, in 96 well tissue culture 
dishes, and culturing these cells in media (CHO-SSFMII) 
supplemented with 5, 10, 15 or 20nM methotrexate. Anti- 
body secreting clones were screened using standard ELISA 
techniques, and the highest producing clones were ex- 
panded and further analyzed. A summary of this amplifi- 
cation experiment is presented in Table 3 below. 
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Table 3: 

Siiinmary of 20F4 An5>lif ication 

E3q>re88ion Level 
# Wellfl Expression Level # Wells pg/c/d from 



nM MTX 


Assayed 


ng/1 96 well 


Expanded 


spinner 


. 10 


56 


3-13 


4 


10-15 


15 


27 


2-14 


3 


15-18 


20 


17 


4-11 


1 


ND 



Methotrexate amplification of 20F4 was set up as de- 
scribed in the text, using the concentrations of metho- 
trexate indicated in the above table. Supernatants 
from all surviving 96 well colonies were assayed by 
ELISA, and the range of anti-CD2 0 expressed by these 
clones is indicated in column 3 . Based on these re- 
sults, the highest producing clones were expanded to 
120ml spinners and several ELISAs conducted on the 
spinner supernatants to determine the pg/cell/day ex- 
pression levels, reported in column 5. 



The data here clearly demonstrates that this site can be 
amplified in the presence of methotrexate. Clones from 
the 10 and 15nM amplifications were found to produce on 
the order of 15 -2 Opg/ cell/day. 

A 15nM clone, designated 20F4-15A5, was selected as 
the highest expressing cell line. This clone originated 
from a 96 well, plate in which only 22 wells grew, and 
was therefore assumed to have arisen from a single cell. 
A 15nM clone,, designated 20F4-15A5, was selected as the 
highest expressing cell line. This clone originated 
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from a 96 well plate in which only 22 wells grew, and 
was therefore assumed to have arisen from a single cell. 
The clone was then subjected to a further round of meth- 
otrexate amplification. As described above, serial 
dilutions of the culture were plated into 96 well dishes 
and cultured in CHO-SS-FMII medium supplemented with 
200, 300 or 400nM methotrexate. Surviving clones were 
screened by ELISA, and several high producing clones 
were expanded to spinner cultures and further analyzed. 
A summary of this second amplification experiment is 
presented in Table 4. . 

Table 4: 

Summary of 20P4-15A5 Amplification 



nM MTX 


# Wells 
Assayed 


Expression Level 
mg/l 96 well 


# Wells 
Expanded 


Expression Level 
pg/c/d, spinner 


200 


67 


23-70 


1 


50-60 


250 


86 


21-70 


4 


55-60 


300 


81 


15-75 


3 


40-50 



Methotrexate amplifications of 20F4-15A5 were set up 
and assayed as described in the text. The highest 
producing wells, the numbers of which are indicated in 
column 4, were expanded to 120ml spinner flasks. The 
expression levels of the cell lines derived from these 
wells is recorded as pg/c/d in column 5. 

The highest producing clone came from the 250nM metho- 
trexate amplification. The 250nM clone, 20F4-15A5-250A6 
originated from a 96 well plate in which only wells 
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grew, and therefore is assumed to have arisen from a 
single cell. Taken together, the data in Tables 3 and 4 
strongly indicates that two rounds of methotrexate am- 
plification are sufficient to reach expression levels of 
5 :60pg/cell/day, which is approaching the maximum secre- 
tion capacity of immunoglobulin in mammalian cells 
(Reff, M.E., Curr. Opin. Biotech., 4:573-576 (1993)). 
The ability to reach this secretion capacity with just 
two amplification steps further enhances the utility of 

10 this homologous recombination system. Typically, random 
integration methods require more than two amplification 
steps to reach this expression level and are generally 
less reliable in terms of the ease of amplification. 
Thus, the homologous system offers a more efficient and 

15 time saving method of achieving high level gene expres- 
sion in mammalian cells. 



binding of IgE to B and T lymphocytes (Sutton, B.J., and 
Gould, H,J., Nature, 366:421-428 (1993)). Anti-human 
CD23 monoclonal antibody 5E8 is a human gamma -1 mono- 
clonal antibody recently cloned and expressed in our 
25 laboratory. This antibody is disclosed in commonly 



Expression of Ant i -Human CD23 Antibody 
in Desmond Marked CHO Cells 



20 



CD23 is low affinity IgE receptor which mediates 
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assigned Serial No. 08/803,085, filed on February 20, 
1997. 

The heavy and light chain genes of 5E8 were cloned^ 
into the manunalian e:q)ression vector N5KG1, a derivative 
5 of the vector NEOSPLA (Bamett et al, in Antijbody Ex- 
pression and Engineering, H.Y Yang and T. Imanaka, eds., 
pp27-40 (1995)) and two modifications were then made to 
the genes . We have recently observed somewhat higher 
secretion of immunoglobulin light chains compared to • 
heavy chains in other ejqjression constructs in the labo- 
ratory (Reff et al, 1997, unpublished observations). In 
an attempt to compensate for this deficit, we altered 
the 5E8 heavy chain gene by the addition of a stronger 
promoter/enhancer element immediately upstream of the 
start site. In subsequent steps, a 2.9kb DNA fragment 
comprising the 5E8 modified light and heavy chain genes 
was isolated from the N5KG1 vector and inserted into the 
targeting vector Mandy. Preparation of 5E8 -containing 
Molly and electroporation into Desmond 15C9 CHO. cells 
was essentially as described in the preceding section. 

One modification to the previously described proto- 
col was in the type of culture medium used. Desmond 
marked CHO cells were cultured in protein- free CD-CHO 
medium (Gibco-BRL, catalog # AS21206) supplemented with 
3mg/L recombinant insulin (3mg/mL stock, Gibco-BRL, 
catalog # AS22057) and 8mM L-glutamine (200mM stock, 
Gibco-BRL, catalog # 25030-081) . Subsequently, trans- 
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fected cells were selected in the above medium supple- 
mented with 400/ig/raL genet icin. In this experiment, 20 
elect roporat ions were performed and plated into 96 well 
tissue culture dishes. Cells grew and secreted anti- 
CD23 in a total of 68 wells, all of which were assiomed 
to be clones originating from a single G418 cell.' 
Twelve of these wells were expanded to I20ml spinner 
flasks for further analysis. We believe the increased 
•number of clones isolated in this experiment (68 com- 
pared with 10 for anti-CD20 as described in Example 4) 
is due to a higher cloning • efficiency and survival rate 
of cells grown in CD-CHO medium compared with CHO-SS- 
FMII medium. Expression levels for those clones ana- 
lyzed in spinner culture ranged from 0.5-3pg/c/d, in 
close agreement with the levels seen for the anti-CD20 
clones. The highest producing anti-CD23 clone, desig- 
nated 4H12, was subjected to methotrexate amplification 
in order to increase its eaqjression levels. This ampli- 
fication was set up in a manner similar to that describ- 
ed for the anti-CD20 clone in Example 4. Serial dilu- 
tions of exponentially growing 4H12 cells were plated 
into 96 well tissue culture dishes and grown in CD-CHO 
medium supplemented with 3mg/L insulin, 8mM glutamine 
and 30, 35 or 40nM methotrexate. A summary of this 
an^jlif ication experiment is presented in Table 5. 



Table 5: 
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Simomary of 2H12 Amplification 

Expression Level 
# Wells Sacpression Level # Wells pg/c/d from 
pM MTX Assayed xng/1 96 well Baqpanded spinner 



30 


100 


6-24 


8 


10-25 


35 


64 


4-27 


2 


10-15 


40 


96 


4-20 


1 


ND 



The highest expressing clone obtained was a 3 0nM clone, 
isolated from a plate on which 22 wells had grown. 
This clone, designated 4H12-30G5, was reproducibly 
secreting 18-22pg antibody per cell per day. This is 
the same range of expression seen for the first ampli- 
fication of the anti CD20 clone 20F4 (clone 20F4-15A5 
which produced 15-18pg/c/d, as described in Example 4) . 
This data serves to further support the observation 
that amplification at this marked site in CHO is repro- 
ducible and efficient. A second amplification of this 
30nM cell line is currently underway. It is antici- 
pated that saturation levels of expression will be 
achievable for the anti-CD23 antibody in just two am- 
plification steps, as was the case for anti-CD20. 



EXAMPLE 6 

Pxpression of Iggnunoadhesin in Desmond Marked CHO C^lls 
CTIiA-4, a member of the Ig superfamily, is found on 
the surface of T lymphocytes and is thought to play a 
role in antigen-specific T-cell activation (Dariavach et 
al, Eur. J. Iimunol., 18:1901-1905 (1988); and Linsley 
et al, J. Exp. Med., 174:561-569 (1991)). In order to 
further study the precise role of the CTLA-4 molecule in 
the activation pathway, a soluble fusion protein com- 
prising the extracellular domain of CTIiA-4 linked to a 
truncated form of the human IgGl constant region was 
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created (Linsley et al (Id*.) . We have recently- 
expressed this CTLA-4 Ig fusion protein in the manunalian 
expression vector BLECHl, a derivative of the plasmid 
NEOSPLA (Barnett et al, in Antibody Expression and Engi- 
neering, H.Y Yang and T. Imanaka, eds., pp27-40 (1995)). 
An 800bp fragment encoding the CTLA-4 Ig was isolated 
from this vector and inserted between the SacII and 
Bglll sites in Molly. 

Preparation of CTLA-4Ig-Molly and electroporation 
into Desmond clone 15C9 CHO cells was performed as de- 
scribed in the previous example relating to anti-CD20. 
Twenty electroporations were carried out, and plated 
into 96 well culture dishes as described previously. 
Eighteen CTLA-4 expressing wells were isolated from the 
96 well plates and carried forward to the 120ml spinner 
stage. Southern analyses on genomic DNA isolated from 
each of these clones were then carried out to determine 
how many of the homologous clones contained additional 
random integrants. Genomic DNA was digested with Bglli 
and probed with a PCR generated digoxygenin labelled 
probe to the human IgGl constant region. The results of 
this analysis indicated that 85% of the CTLA-4 clones 
are homologous integrants only; the remaining 15% con- 
tained one additional random integrant. This result 
corroborates the findings from the expression of anti- 
CD20 discussed above, where 80% of the clones were sin- 
gle homologous integrants. Therefore, we can conclude 
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that this expression system reproducibly yields single 
targeted homologous integrants in at least 80% of all 
clones produced. 

Expression levels for the homologous CT1A4-Ig 
clones ranged from 8-12pg/cell/day . This is somewhat 
higher than the range reported for anti-CD20 antibody 
and anti-CD23 antibody clones discussed above. However, 
we have previously observed that expression of this 
molecule using the intronic insertion vector system also 
resulted in significantly higher expression levels than 
are obtained for immunoglobulins. We are currently 
unable to provide an explanation for this observation! 

EXAMPLE 7 

Targeting Anti-> CD20 to an alternate Desmond Marked CHQ 
Cell Line 

As we described in a preceding section, we obtained 
5 single copy Desmond marked CHO cell lines (see Figures 
4 and 5) . In order to demonstrate that the success of 
our targeting strategy is not due to some unique proper- 
ty of Desmond clone 15C9 and limited only to this clone, 
we introduced anti-CD20 Molly into Desmond clone 9B2 
(lane 6 in figure 4, lane 1 in figure 5) . Preparation 
of Molly DNA and electroporation into Desmond 9B2 was 
exactly as described in the previous example pertaining 
to anti-CD20- We obtained one homologous integrant from 
this experiment. This clone was expanded to a 120ml 



wo 98/41645 



PCTAJS98/03935 




48 - 



spinner flask, where it produced on average 1.2pg anti- 
CD20/cell/day. This is considerably lower expression 
than we observed with Molly targeted into Desmond 15C9. 
However, this was the anticipated result, based on our 
northern analysis of the Desmond clones. As can be seen 
in Figure 5, mRNA levels from clone 9B2 are considerably 
lower than those from 15C9, indicating the site in this 
clone is not as transcriptionally active as that in 
.15C9. Therefore, . this experiment not only demonstrates 
the reproducibility of the system - presumably any 
marked Desmond site can be targeted with Molly - it also 
confirms the northern data that the site in Desmond 1509 
is the most transcriptionally active. 

From the foregoing, it will be appreciated that, 
although specific embodiments of the invention have been 
described herein for purposes of illustration, various 
modifications may be made without diverting from the 
scope of the invention. Accordingly, the invention is 
not limited by the appended claims. 
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WHAT IS CLAIMED ISz 

1. A method for inserting a desired DNA at a 
target site in the genome of a mammalian cell which 
comprises the following steps: 

(i) transf eating or transforming a mammalian cell 
with a first plasmid ("marker plasmid") containing the 
following sequences: 

(a) a region of DNA that is heterologous to 
the mammalian cell genome which when integrated in the 
mammalian cell genome provides a unique site for homolo- 
gous recombination; 

(b) a DNA fragment encoding a portion of a 
first selectable marker protein; and 

(c) at least one other selectable marker DNA 
that provides for selection of mammalian cells which 
have been successfully integrated with the marker plas- 
mid; 

(ii) selecting a cell which contain the marker 
plasmid integrated in its genome; 

(iii) transf ecting or transforming said selected 
cell with a second plasmid ("target plasmid") which 
contains the following sequences: 

(a) . a region of DNA that is identical or is 
sufficiently homologous to the unique region in the 
marker plasmid such that this region of DNA can recom- 
bine with said DNA via homologous recombination; 
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(b) a DNA fragment encoding a portion of the 
same selectable marker contained in the marker plasmid, 
wherein the active selectable marker protein encoded by- 
said DNA is only produced if said fragment is expressed 
5 in association with the fragment of said selectable 
marker DNA contained in the marker plasmid; and 

(iv) selecting cells which contain the target plas- 
mid integrated at the target site by screening for the 
expression of the first selectable marker protein, 

10 2. The method of Claim 1, wherein the DNA frag- 

ment encoding a fragment of a first selectable marker is 
an exon of a dominant selectable marker. 

3. The method of Claim 2, wherein the second 
plasmid contains the remaining exons of said first 

15 selectable marker. 

4. The method of Claim 3, wherein at least one 
DNA encoding a desired protein is inserted between said 
exons of said first selectable marker contained in the 
target plasmid. 

20 5. The method Claim 4, wherein a DNA encoding a 

dominant selectable marker is further inserted between 
the exons of said first selectable marker contained in 
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the target plasmid to provide for co-amplification of 
the DNA encoding the desired protein. 

6. The method of Claim 3, wherein the first domi- 
nant selectable marker is selected from the group con- 
5 sisting of neomycin phosphotransferase, histidinol ^^^y- 
drogenase, dihydrofolate reductase, hygromycin phospho- 
transferase, herpes simplex virus thymidine kinase, 
adenosine deaminase, glutamine synthetase, and 
hypoxanthine -guanine phosphoribosyl transferase. 

10 7. The method of Claim 4, wherein the desired 

protein is a mammalian protein. 

8. The method of Claim 7, wherein the protein is 
an . immunoglobulin . 

9. The method of Claim 1, which further comprises 
15 determining the RNA levels of the selectable ma^rker (c) 

contained in the marker plasmid prior to integration of 
the target vector. 

10. The method of Claim 9, wherein the other 
selectable marker contained in the marker plasmid is a 

20 dominant selectable marker selected from the group con- 
sisting of histidinol dehydrogenase, herpes simplex 
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thymidine kinase, hydromycin phosphotransferase, adeno- 
sine deaminase and glutamine synthetase. 

11. The method of Claim 1, wherein the mammalian 
cell is selected from the group consisting of Chinese 

5 hamster ovary (CHO) cells, myeloma cells, baby hamster 
kidney cells, COS cells, NSO cells, HeLa cells and NIH 
3T3 cells. 

12. The method of Claim 11, wherein the cell is a 
CHO cell. 

10 13. The method of Claim 1, wherein the marker 

plasmid contains the third exon of the neomycin phospho- 
transferase gene and the target plasmid contains the 
first two exons of the neomycin phosphotransferase gene. 

14. The method of Claim 1, wherein the marker 
15 plasmid further contains a rare restriction endonuclease 
sequence which . is inserted within the region of homolo- 



gy- 



20 



15. The method of Claim 1, wherein the unique 
region of DNA that provides for homologous recombination 
is a bacterial DNA, a viral DNA or a synthetic DNA. 
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16. The method of Claim 1, wherein the unique 
region of DNA that provides for homologous recombination 
is at least 300 nucleotides. 

17. The method of Claim 16, wherein the unicjue 
region of DNA ranges in size from about 300 nucleotides 
to 20 kilobases. 

18. The method of claim 17, wherein the unique 
region of DNA preferably ranges in size from 2 to 10 
kilobases. 

19. The method of Claim 1, wherein the first 
selectable marker DNA is split into at least three 
exons . 

20. The method of Claim 1, wherein the unique 
region of DNA that provides for homologous recombination 
is a bacterial DNA, an insect DNA, a viral DNA or a 
synthetic DNA. 

21. The method of Claim 20, wherein the unique 
region of DNA does not contain any functional genes. 

22. A vector system for inserting a desired DNA at 
a target site in the genome of a mammalian cell which 
comprises at least the following: 
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(i) a first plasmid ("marker plasmid") containing 
at least the following sequences: 

(a) a region of DNA that is heterologous to 
the mainmalian cell genome which when integrated in the 

5 mammalian cell genome provides a unique site for homolo- 
gous recombination; 

(b) a DNA fragment encoding a portion of a 
first selectable marker protein; and 

(c) at least one other selectable marker DNA 
10 that provides for selection of mammalian cells which 

have been successfully integrated with the marker plas- 
mid; and 

(ii) a second plasmid ("target plasmid") which con- 
tains at least the following sequences: 

15 (a) a region of DNA that is identical or is 

sufficiently homologous to the unique region in the 
marker plasmid such that this region of DNA can recom- 
bine with said DNA via homologous recombination; 

(b) a DNA fragment encoding a portion of the 

20 same selectable marker contained in the marker plasmid, 
wherein the active selectable marker protein encoded by 
said DNA is only produced if said fragment is expressed 
in association with the fragment of said selectable 
marker DNA contained in the. marker plasmid- 
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23. The vector system of Claim 22, wherein the DNA 
fragment encoding a fragment of a first selectable mark- 
er is an exon of a dominant selectable marker, 

.24. The vector system of Claim 23, wherein the 
5 second plasmid contains the remaining exons of said 
first selectable marker. 

25. The vector system of Claim 24, wherein at 
least one DNA encoding a desired protein is inserted 
between said exons of said first selectable marker con- 

10 tained in the target plasmid. 

26. The vector system of Claim 24, wherein a DNA 
encoding a dominant selectable marker is further insert- 
ed between the exons of said first selectable marker 
contained in the target plasmid to provide for co-ampli- 

15 fication of the DNA encoding the desired protein. 

27. The vector system of Claim 24, wherein the 
first dominant selectable marker is selected from the 
group consisting of neomycin phosphotransferase, 
histidinol dehydrogenase, dihydrofolate reductase, 

20 hygromycin phosphotransferase, herpes simplex virus 

thymidine kinase, adenosine deaminase, glutamine synthe- 
tase, and hypoxanthine -guanine phosphoribosyl transfer- 



ase. 
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28. The vector system of Claim 25, wherein the 
desired protein is a mammalian protein. 



29. The vector system of Claim 28, wherein the 
protein is an immunoglobulin. 



5 



30. The vector system of Claim 22, wherein the 



other selectable marker contained in the marker plasmid 
is a dominant selectable marker selected from the group 
consisting of histidinol dehydrogenase, herpes simplex 
thymidine kinase, hydromycin phosphotransferase, adeno- 
10 sine deaminase and glutamine synthetase. 

31. The vector system of Claim 22, which provides 
for insertion of a desired DNA at a targeted site in the 
genome of a mammalian cell selected from the group con- 
sisting of Chinese hamster ovary (CHO) cells, myeloma 

15 cells, baby hamster kidney cells, COS cells, NSO cells, 
HeLa cells and NIH 3T3 cells. 

32. The vector system of Claim 31, wherein the 
mammalian cell is a CHO cell. 

33. The vector system of Claim 22, wherein the 
20 marker plasmid contains the third exon of the neomycin 

phosphotransferase gene and the target plasmid contains 
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the first two exons of the neomycin phosphotransferase 
gene- 

34. The vector system of Claim 22, wherein the 
marker plasmid further contains a rare restriction endo- 
nuclease sequence which is inserted within the region of 
homology. 

35. The vector system of Claim 22, wherein the 
unique region of DNA that provides for homologous recom- 
bination is a bacterial DNA, a viral DNA or a synthetic 
DNA. 

36. The vector system of Claim 22, wherein the 
unique region of DNA (a) contained in the marker plasmid 
vector system that provides for homologous recombination 
is at least 300 nucleotides. 

37. The vector system of Claim 36, wherein the 
unique region of DNA ranges in size from about 300 
nucleotides to 20 kilobases. 



38. The vector system of Claim 37, wherein the 
unique region of DNA preferably ranges in size from 2 to 
10 kilobases. 
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39. The vector system of Claim 22, wherein the 
first selectable marker DNA is split into at least three 
exons . 

40. The vector system of Claim 22, wherein the 
unique region of DNA that provides for homologous recom- 
bination is a bacterial DNA, an insect DNA, a viral DNA 
or a synthetic DNA. 

41. The vector system of Claim 40, wherein the 
unique region of DNA does not contain any functional 
genes . 
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HD = Salmonella HisD Gene 

N3 = Neomycin Phosphotransferase Exon 3 

D = Marine Dihydrof olate reductase 

£ = Cytomegalovirus and SV40 Enhancers 

SA = Splice acceptor 

BT = Mouse Beta Globin Major Promoter 

B = Bovine Growth Hormone Polyadenylation 

S « SV40 Early Polyadenylation 

SV = SV40 Late Polyadenylation 



FIGURE lA 




FIGURE IB 
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EcoR I 



Sac II 



EcoR I 



D = Dihydrof olate reductase 

Kl = Neomycin Phosphotransferase Exon 1 

K2 =s Neon^cin Phosphotransferase Escon 2 

VL s= Anti-CD20 Light chain leader + Variable 

K as Human Kappa Constant 

VH — Anti-CD20 Heavy chain Leader + Variable 

61 — Hianan Gamma 1 Constant 

ED — Salmonella. Histidinol Dehydrogenase 

£ s and SV40 enhancers S » SV40 Origin 

SD s Splice donor SA. » Splice accqptor 

C CMV promoter /enhancer 

T B HSV TK promoter and Polyoma enhancers 

BT = Mbuse Beta Globin Major Prraioter 

SV s SV40 Late Polyad^ylation 

B s Bovine Growth Hormone Polyadenylation 



FIGURE 2A 
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Soathem Analysis of Desmond Marked CHO Cells 




FIGURE 4 




FIGURE 5 
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Sottthem Analysiiis of Anti CD2Q 

Integrants In Marked CHO Cells 




FIGURE 6 
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OesBond 1^^^ 

10 28 30 40 50 60 

TTTCTAGACC TA6GGCCCCC ACCTACTAGC TTT6CTTCTC AATTTCTTAT TTGCATAAT6 

70 80 90 100 11® FIGURE 7 

AGAAAAAAAG GAAAATTAAT TTTAAaCCA ATTCA6TAGT TGATTGAGCA AAT6CGTTGC 

130 140 150 160 170 180 

CAAAAA66AT GCTTTAGAGA CAGTGTTCtC T6CACA6ATA AG6ACAAACA TTATTCAGAG 

190 200 210 220 230 240 

GGAGTACCCA GAGCT6A6AC TCCTAAGCCA 6TGAGTGGCA CA6CATCCAG GGAGAAATAT 

250 260 270 280 290 300 

GCTTGTCATC ACCGAA6CCT GATTCCGTAG AGCCACACCC TGGTAAGGGC CAATCT6CTC 

310 320 330 340 350 360 

ACACAGGATA GAGAGGG^G GAGCCAGGGC AGAGCATATA AGGTGA6GTA GGATCAGTT6 

370 380 390 400 41® *20 

CTcAgAT itGCTTCTGA CATAGTTGTG TTGGGAGCTT GGATAGCTTG GGGGGGG6AC 

430 440 450 460 470 480 

AGCTCAGG6C TGCGATTTCG CGCCAAACTT GACGGCAATC CTAGCGTGAA 66CTGGTAGG 

490 500 510 520 530 540 

ATTTTATGCC CGCTGCCATC ATGG7TCGAC CATTiGAACTG CATCGTCGCC GTGTCCCAAA , 

550 560 ^ 570 580 590 600 

ATATGGGGAT TGGCAAGAAC GGAGWICTAC CCTGGCCTCC GCTCAGGAAC GAGTTCAA6T 

610 62i0; 5" 630 640 650 660 

ACTTCCAAAG AATGACCACA ACCfCTTCAG^TGGAAGGTAA ACAGAATCTG 6TGATTATGG . 

670 680 690 700 710 720 

GTAGGAAAAC CTGGTTCTCC A7TCCTGAGA AGAATCGACC TTTAAAGGAC AGAATTAATA 

730 740 750 760 ^770 780 

•.TTCTCAG TAGA6AACTC AAAGAACCAC CACGAGGA6C TCATTTTCTT 6CCAAAA6TT 

790 800 810 820 830 840 

TGGATGATGC CTTAAGACTT ATTGAACAAC CGGAATTGGC AAGTAAAGTA GACATGGTTT 

850 860 870 880 890 900 

GGATAGtCGG A6GCA6TTCT GTTTACCAGG AAGCCAT6AA TCAACCACGC CACCTCA6AC 

910 920 930 940 950 ^^^J* 

TCTTT6T6AC AA6GATCATG CA66AATTTG AAAGTGACAC 6TTTTTCCCA GAAATTGATT 

970 980 990 1000 1®1® 1®^® 

TG6GGAAATA TAAACTTCTC CCAGAATACC CAGGCGTCCT CTCTGA6GTC CAGGAGGAAA 

1030 1040 1050 1060 1070 1080 

AAGGCATCAA GTATAA6TTT GAAGTCTAC6 AGAAGAAA6A CTAACAGGAA GATGCTTTa 

1090 1100 mo 1120 1130 11*; 

AGTTCTCT6C TCCCCTCaA AAGCTATGa TTTTTATAAG ACCATG66AC TTTTGCT66C 

1150 1160 1170 1180 1190 ^2.99 

7TTA6ATCAG CCTCGACTGT GCCTTCTA6T T6CCA6CCAT CTGTTGTTTG CCCCTCCCCC 

1210 1220 1230 1240 1250 1260 

GTGCCTTCCT TGACCCTGGA AGGTGCCACT CCCACTGTCC TTTCCTAATA AAAT6AG6AA 

1270 1280 1290 1300 1310 1320 

-ATTGCATCGC ATTGTCTGAG TAGGTGTCAT TCTATTCTGG GGGGTGG6GT GGGGCAGGAC 
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1330 1340 1350 1360 1370 1380 

AGCAAG6GGG AGGATTG6GA AGACAATAGC A6GCATGCTG GGGATGCGGT GGGCTCTATG 

1390 1400 1410 1420 1430 1440 

GCTTCTGA66 CGGAAAGAAC CAGCTGG6GC TC6AA6CGGC CGCCCATTTC 6CTGGTG6TC 

1450 1460 1470 1480 1*9« 1500 

A6AT6C666A T6GC6T66GA C6CG6CG6GG AGC6TCACAC T6A66TTTTC C6CCA6ACGC 

1510 1520 1530 1540 1550 1560 

CACTCaCCC A66CGCT6AT GTGCCC6GCT TCTGACaTG CGGTC6C6TT C6GTT6CACT 

1570 1580 1590 1600 1610 1620 

AC6CGTACTG TGAGCCAGAG TTGCCCGGCG CTCTCC66CT GCGGTAGTTC AGGCAGTTCA 

1630 1640 1650 1660 1670 1680 

ATCAACTGTT TACC7TGTGG AGCGACATCC AGAGGCACTT CACCGCTT6C CAGCGGCTTA 

1690 1700 1710 1720 1730 1740 

ATCCAGCG CCACaTCCA GTGCAG6AGC TC6TTATCGC TAT6ACG6AA CA6GTATTC6 

1750 1760 1770 1780 1790 1800 

CTGGTCACTT CGAT6GTTTG CCCGGATAAA CGGAACTGGA AAAACTGCTG CTGGTGTTTT 

1810 1820 1830 1840 1850 1860 

GCTTCC6TCA GCGCTGGATG CGGCGTGCGG TCGGaAAGA CCAGACCGTT CATACAGAAC 

1870 1880 1890 1900 1910 1920 

TGGCGATCGT TCGGCGTATC GCCAAAATCA CCGCCGTAAG CCGACCACGG CTTGCCGTTT 

1930 1940 1950 I960 1970 1980 

TCATCATATT TAATaCCGA CT6ATCCACC CAGTCCCA6A C6AAGCCGCC CTGTAAAC66 

1990 2000 2010 2020 2030 2040 

G6ATACTGAC GAAACGCCTG CCAGTATTTA GCGAAACC6C CAAGACTG7T ACCCATC6CG 

2050 2060 2070 2080 2090 2100 

.G6C6TATT CGCAAAC6AT CAGC6GGCGC GTCTCTCaG GTAGCGAAAG CCATTTTTTG 

2110 2120 2130 2140 2150 2160 

ATGGACCATT TCGGCACAGC CGGGAAGGGC TGGTCTTCAT CCACGCGCGC GTACATC6GG 

2170 2180 2190 2200 2210 2220 

CAAATAATAT CGGTG6CCGT GGTGTCGGCT CCGCC6CC7T CATACTCaC CGG6CG66AA 

2230 2240 2250 2260 2270 _,.2280 

GGATC6ACA6 ATTT6ATCCA GCGATACA6C GC6TCCT6AT TA6CGCC6TC GCCT6ATTCA 

2290 2300 2310 2320 2330 2340 

TTCCCCA6C6 ACCA6AT6AT CACACTC6G6 T6ATTAC6AT CGCGCTCaC aTTCGCGTT 

2350 2360 2370 2380 2390 2400 

AC6CGTTC6C TCATCGCCGG TAGCCAGCGC GGATCATCGG TCAGACGATT CATTGGCACC 

2410 2420 2430 2440 2450 2460 

ATGCCGT6GG TTTCAATATT GGCTTCATCC ACCACATACA G6CCGTAGCG GTCGCACAGC 

2470 2480 2490 2500 2510 2520 

GTGTACCACA GCGGATGGTT CGGATAATGC GAAUGCGCA CG6CGTTAAA GTTGTTCTGC 

2530 2540 2550 2560 2570 2580 

TTCATCA6CA GGATATCCT6 CACCATCGTC TGCTCATCCA TGACCTGACC ATGCAGA6GA 



2590 2600 2610 2620 2630 2640 
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0" # 

CflWACGGTTAAC GCCTCGAATC AGCAAC6GCT T<Tc( 



TGATGCTCGWaCGGTTAAC GCCTCGAATC AGCAAC6GCT TCCCGTTCAG CA6CAGCAGA 

2650 2660 2670 2680 2690 2700 

CCATTTTCAA TCCGCACCTC GCGGAAACCG ACATCGCAG6 CTTCT6CTTC AATaCCGTG 

2710 2720 2730 2740 27S0 2760 

CC6TCGGCGG T6T6CA6TTC AACCACCGCA CGATAGAGAT TC66GATTTC GGC6CTCCAC 

2770 2780 2790 2800 2810 2820 

AGTTTCGG6T TTTCGACGTT CAGACGTA6T GT6ACGCGAT CG6CATAACC ACCACGCTCA 

2830 2840 2850 2860 2870 2880 

TCGATAATTT CACCGCC6AA AG6CGCGGTG CCGCTGGCGA CCT6CGTTTC ACCCT6CCAT 

2890 2900 2910 2920 2930 2940 

AAAGAAAaC TTACCC6TA6 GTAGTaCGC AACTCGCCGC ACATCT6AAC TTCAGCCTCC 

2950 2960 2970 2980 2990 3000 

AGTACAGCGC GGCT6AAATC ATCATTAAA6 CGAGTG6CAA CAT6GAAATC GCTGATTTCT 

3010 3020 3030 3040 3050 3060 

iilAGTCGGTT TATGCAGCAA CGAGACGTCA CGGAAAATGC CGCTCATCCG CCACATATCC 

3070 3080 3090 3100 3110 312 0 

TGATCTTCCA GATAACT6CC 6TCACTCCAG CGCA6CACCA TCACCGCGA6 6CGGTTTTCT 

3130 3140 3150 3160 3170 3180 

CCGGCGCGTA AAAAT6CCCT CAGGTCAAAT TCAGAC6GCA AACGACTGTC CTGGCCGTAA 

3190 3200 3210 3ZZ0 3Z30 3240 

CCGACCCAGC GCCCGTTGCA CCACAGATGA AAC6CCGA6T TAACGCCATC AAAAATAATT 

3250 3260 3270 3280 3290 3300 

C6C6TCTG6C CTTCCTGTAG CCAGCTTTCA TCAACATTAA ATGT6A6C6A 6TAACAACCC 

3310 3320 3330 3340 3350 3360 

GTCGGATTCT CCGTGGGAAC AAACGGCGGA TTGACCGTAA T6GGATAGGT CAC6TT66TG 

3370 3380 3390 3400 3410 3420 

iaGATGGGCG CATCGTAACC GTGCATCTGC aGTTTGAGG GGACGACGAC AGTATC6GCC 

3430 3440 3450 3460 3470 3480 

TCAGGAAGAT CGCACTCCAG CCAGCTTTCC G6CACCGCTT CTG6T6CCGG AAACCAGGa 

3490 3500 3510 3520 3530 3540 

AA6C6CCATT CGCCATTUG GCT6C6CAAC TGTTG6GAA6 GGC6ATC6GT GC6GGCCTCT 

3550 3560 3570 3580 3590 3600 

TCGCTATTAC GCCA6CTGGC GAAAGGGG6A TGTGaGCAA GGCGATTAAG TTGGGTAACG 

3610 36Z0 3630 3640 3650 3660 

CCACGGTTTT CCCAGTCACG AC6TT6TAAA AC6ACTTAAT CCGTCGAGGG GCT6CCTC6A 

3670 3680 3690 3700 3710 3720 

A6CAAACGAC CTTCC6TTGT GCAGCCAGCG GC6CCTGCGC CGGTGCCaC AATCGT6CGC 

3730 3740 3750 3760 3770 3780 

6AACAAACTA AACCA6AACA AATTATACCG GCGGCACCGC CGCCACCACC TTCTCCCGTG 

3790 3800 3810 3820 3830 3840 

CCTAACATTC CAGCGCCTCC ACCACaCCA CCACCATCGA TGTCTGAATT GCCGCCC6CT 

3850 3860 3870 3880 3890 3900 

CCACCAATGC CGACGGAACC TCAACCCGCT GCACCTTTA6 ACGACAGACA ACAATT6TTG 
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391^^ 3920 3930 3940 3950 3960 

GAAGCTATTA GAAACGAAAA AAATC6CACT C6TCTCAGAC CGGCTCTCTT AAGGTACaC 

3970 3980 3990 4000 ^^1^ 

AAACCAAAAA CGGCGCCCGA AACCAGTACA ATA6TT6A66 T6CCGACTGT GTTGCCTAAA 

d(l30 4040 4050 4060 4070 4080 

GA6ACATTT6 AGCCTAAACC GCCGTCTGCA TCACCGCCAC CACCTCC6CC TCCGCCTCCG 

ccgccaJ??? cgcctgS?? tccaccJSS GTAGAmi? catcagct!? accaccS« 

iticA An fid 4.170 4180 

ccattaSJS ATTTGaG?? tgaaatJSa ccaccgcctg caccatcgct ttctaacgtg 

A7lfl 4220 4230 4240 4250 4260 

TT6TCT6AAT TAAAATCG6G CACAGTTAGA TTGAAACCC6 CCCAAAAACG CCCGCAAia 

4270 4280 4290 4300 4310 4320 

\ATAATTC CAAAAAGCTC AACTACAAAT TTGATC6CGG ACGTGTTAGC CGACACAATT 

4330 4340 4350 4360 4370 4380 

AATAGGCGTC GTGTGGCTAT GGCAAAATCG TCTTCGGAAG CAACTTCTAA CGAC6AG6GT 

4390 4400 4410 4420 4430 4440 

TGGGACGACG ACGATAATCG GCCTAATAAA GCTAACAC6C CC6ATGTTAA ATATGTCCAA 

4450 4460 4470 4480 4490 4500 

GCTACTA6T6 GTACCTTAAT TAA6GGGCGG AGAAT6GGC6 GAACTGGGC6 GAGTTAGG6G 

Acift 4530 4540 4550 4560 

CGGGATwi? GGAGTtIIS GCGGGACTAT GGTTGCTGAC TAATTGAGAT 6CATGCTTT6 

^cTft AKHth 45Q0 4600 4610 4620 

CATAc4ctS CCTGaw" A6CCTGMW CTTTCCACAC CTGGTTGaG ACTAATTGAG 

Ac^a AAAA 4fi50 4660 4670 4680 

-SCATCO? TCCATACm TGCCTtCTtt GGACCCTOtS CACTTTCCAC ACCCTAACTt 

AacAcJ??? caca«a1??? attcccS" ttattaS" taatcaJto cmgctSI? 

AT«ft A7fill 4770 4780 4790 4800 

agttcaJS? ccatatSS agttccJS? tacataactt acggtaaatg gcccgcctgg 

CTGACCkS AACGAcSI? GCCCAT^S? GTCAAtJK? ACGTAxtm CCATAGTAAC 

gccaatISS actttcS?? gacgtcIJtS ggtggaJtJ? TTAC6G^S! CTGCcacn 

AQAA 4950 4960 4970 4980 

GGacrlS? <:AA6T6^?^? atatgcSg tacgccccct attgacgtca atgacggtaa 

ATGGCaS? TGGCAT^R? CCCAGtJS? GACCTtSw GACTTtIctS CTTGCaGTA 

Sase 5060 S070 5090 5100 

CATCTAC6TA TTA6TCATCG CTATTACCAT GGTGATGCGG TTTTGGCA6T ACATCAATGG 

5110 5120 5130 5140 5150 5160 



GCGTGGiTAG CGGTTTGACT CACGG6GATT TCCAAGTCTC CACCCCATTG ACGTCAATGG 

5170 5180 5190 5200 ^^5210 JMO 

6A6TTTCTTT TGAA6CTTGG CCGGCCA6CT TTATTTAACG TGTTTACGTC GAGTCAATT6 
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5230 5240 5250 5260 5270 5280 

TACACTAAC6 ACAGTGATGA AAGAAATACA AAAGCGCATA ATATTTT6AA CGACGTCGAA 

5290 5300 5310 5320 5330 5340 

CCTTTATTAC AAAAWAAAC ACAAACGAAT ATCCACAAAG CTAGATTGCT GCTACAA6AT 

5350 5360 5370 5380 5390 5400 

TTGGCAAGTT TT6TGGCGTT GAGCGAAAAT CCA7TAGATA 6TCCA6CCAT CGGTTCGGAA 

5410 5420 5430 5440 5450 5460 

AAACAACCCT TGTTTGAAAC TAATCGAAAC CTATTTTACA AATCTATTGA G6ATTTAATA 

5470 5480 5490 5500 5510 5520 

TTTAAATTCA 6ATATAAAGA C6CT6AAAAT CATTTGAnT TCGCTCTAAC ATACCACCCT 

5530 5540 5550 5560 5570 5580 

AAAGATTATA AATTTAATGA ATTATTAAAA TACATCAGCA ACTATATATT GATA6ACATT 

5590 5600 5610 5620 5630 5640 

. vAGTTTGT GATATTA6TT TGTGCGTCTC ATTACAATGG CT6TTATTTT TAACAACAAA 

5650 5660 5670 5680 56 90 5700 

CAACT6CTCG CA6ACAATAG TATA6AAAAG 66AGGTGAAC TGTTTTT6TT TAACGGTTCG 

5710 57Z0 5730 5740 5750 5760 

TACAACATTT TGGAAAGTTA TGTTAATCCG GTGCTGCTAA AAAAT6GT6T AATTGAACTA 

5770 5780 5790 5800 5810 5820 

GAA6AAGCTG CGTACTATGC CGGCAACATA TTGTACAAAA CC6ACGATCC CAAATTCATT 

5830 5840 5850 5860 5870 5880 

6ATTATATAA ATTTAATAAT TAAAGCAACA CACTCCGAAG AACTACCAGA AAATAGCACT 

5890 5900 5910 5920 5930 5940 

GTTGTAAATT ACAGAAAAAC TATGCGCACC GGTACTATAC ACCCCATTAA AAAAGACATA 

5950 .5960 5970 5980 5990 6000 

...fATTTATG ACAACAAAAA ATTTACTCTA TACGATAGAT AaTATATGG ATACGATAAT 

6010 6020 6030 6040 6050 6060 

AACTAT6TTA ATTTTTATGA 6GAGAAAAAT GAAAAAGAGA A6GAATACGA AGAA6AAGAC 

6070 6080 6090 6100 6110 6120 

GACAA6GCGT CTA6TTTATG TGAAAATAAA ATTATATTGT C6CAAATTAA CT6TGAATCA 

6130 6140 6150 6160 6170 6180 

TTT6AAAAT6 ATTTTAAATA TTACCTCAGC GATTATAACT ACGCGTTTTC AATTATAGAT 

6190 6200 6Z10 6220 6230 6Z40 

AATACTACAA ATGTTCTTGT T6CGTTTGGT TTGTATCGTT AATAAAAAAC AAATTTGACA 

6ZSe 6260 6270 6280 6290 6300 

TTTATAATT6 TTTTATTATT CAATAATTAC AAATA6GA1T GA6ACCCTTG CAGTTGCaG 

6310 6320 6330 6340 6350 6360 

CAAACGGACA GAGCTTGTCG AG6AGA6TTG TTGATTaTT 6TTT6CCTCC CTGCTGCG6T 

6370 6380 6390 6400 6410 6420 

TTTTCACC6A A6TTCATGCC A6TCCA6C6T TTTTGCAGCA GAAAAGCCGC CGACTTC6GT 

6430 6440 6450 6460 6470 6480 

TT6C66TCGC GAGTGAA6AT CCCTTTCTT6 TTACCGCCAA C6C6CAATAT 6CCTT6CGA6 
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6TCGCAAAAT CGGCGAAATT CCATACCTGT TCACCGACGA CGGCGCTGAC 6CGATCAAAG 

6550 6560 6570 6580 6590 6600 

AC6C66TGAT ACATATCCAG CCAT6CACAC TGATACTCTT CACTCaaX GTCC6TCTAC 

6610 6620 6630 6640 6650 6660 

ATTSAGTGCA GCCCG6CTAA C6TATCCACG CCGTATTCGG TGATGATAAT CGGCTGAT6C 

6670 6680 6690 6700 6710 6720 

A6TTTCTCCT GCCAGGCCAG AA6TTCTTTT TCCA6TACCT TCTCT6CC6T TTCCAAATCG 

6730 6740 6750 6760 6770 6780 

CCGCT7TGGA CATACCATCC GTAATAACGG TTCAGGCACA GCACATCAAA GAGATCGaG 

6790 6800 6810 6820 6830 6840 

ATGGTATCGG T6TGAGCGTC GCAGAACATT ACATTGACGC AG6TGATCGG AC6CGTC666 

6850 6860 6870 6880 6890 6900 

TCGA6TTTAC GCGTT6CTTC CGCCAGT6GC GC6AAATATT CCC6T6CACC TT6C66ACC6 

6910 6920 6930 6940 6950 6960 

6TATCC66TT C6TT66CAAT ACTCCACATC ACCAC6CTT6 GGTGbllill GTaCGCGCT 

6970 6980 6990 7000 7010 7020 

ATCAGCTCTT TAATCGCCTG TAAGTGCGCT TGCTGAGTTT CCCCGTTGAC TGCCTCTTCG 

7030 7040 7050 7060 7070 7080 

CTGTACAGTT CTTTCGGCTT GTT6CCC6CT TCGAAACCAA TGCCTAAAGA GAGGTTAAAG 

7090 7100 7110 7120 7130 7140 

CCGACAGCAG CAGTTTCATC AATCACCACG ATGCCATGTT CATCTGCCCA GTCGAGCATC 

7150 7160 7170 7180 7190 7200 

TCTTCAGCGT AAGGGTAAT6 CGAG6TACGG TAG6AGTTGG CCCCAATCCA GTCCATTAAT 

7210 7220 7230 7240 7250 7260 

GCGTGGTCGT GCACCATCAG CACGTTATCG AATCCTTTGC CACGCAA6TC CGCATCTTCA 

7270 7280 7290 7300 7310 7320 

TGACGACCAA AGCCAGTAAA GTAGAACGGT TTGTGGTTAA TCAGGAACTG TTCGCCCTTC 

7330 7340 7350 7360 7370 7380 

ACTGCCACTG ACCGGAT6CC 6AC6C6AAGC 6C6TA6ATAT aCACTCTGT CT6GCTTTT6 

7390 7400 7410 7420 7430 7440 

GCT6TGAC6C ACAGTTCATA 6AGATAACCT TCACCCGGTT GCCA6A6CT6 CG6ATTCACC 

7450 7460 7470 7480 7490 7500 

ACTT6CAAAG TCCCGCTA6T GCCTTGTCCA GTT6CAACCA CCTGTTGATC CGaTCACGC 

7510 7520 7530 7540 7550 7560 

AGTTCAACGC TGACATCACC ATT6GCCACC ACCTGCCAGT CAAaGACGC 6TGGTTACAG 

7570 7580 7590 7600 7610 7620 

TCTTGC6CGA CATGCGTaC aCGGTGATA TCGTCCACCC AGGT6TTCGG CGT6GTGTAG 

7630 7640 7650 7660 7670 ^, 7680 

A6CATTACGC T6CGATG6AT TCCGGCATAG TTAAAGAAAT CATG6AAGTA AGACT6CTTT 

7690 7700 7710 7720 7730 7740 

TTCTTCCC6T TTTC6TC66T AATCACCATT CCCGGCGGGA TA6TCT6CCA GTTCAGTTCG 

7750 7760 7770 7780 7790 7800 

TTGTTaCAC AAACGGTGAT ACCCCTCGAC GGATTAAAGA CTTCAAGCGG TCAACTATGA 
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7810 7820 7830 7840 7850 7860 

AGAACT6TTC GTCTTCGTCC aCTAAGCTA T6TCTCCA6A ATGTAGCaT CCATCCTTGT 

7870 7880 7890 7900 7910 >920 

CAATCAAGGC GTTGGTC6CT TCCGCATTGT TTACATAACC G6ACATAATC ATAG6TCCTC 

7930 7940 7950 7960 7970 7980 

TGACACATAA TTCGCCTCTC TGATTAAC6C CCAGCGTTTT CCCGGTATCC AGATCCAaA 

7990 8000 8010 8020 8030 8040 

CCTTCGCTTC AAAAAATGGA ACAACTTTAC CGACCGCGCC CGGTTTATCA TCCCCCTCGG 

8050 8060 8070 8080 8090 8100 

GTGTAATCAG AATAGCTGAT GTAGTCTCA6 TGAGCCCATA TCCTTGTCGT ATCCCTG6AA 

8110 8120 8130 8140 8150 8160 

GATGGAAGCG TTTTGCAACC GCTTCCCCGA CTTCTTTC6A AAGAGGT6CG CCCCCAGAA6 

8170 8180 8190 8200 8210 8220 

\TTTC6TG TAAATTAGAT AAATCGTATT TGTCAATCAC AGTGCTTTTG 6CGAAGAAT6 

8230 8240 8250 8260 8270 8280 

AAAATAGGGT TGGTACTAGC AAC6CACTTT GAATTTTGTA ATCCT6AAC6 GATCGTAAAA 

8290 8300 8310 8320 8330 8340 

ACA6CTCTTC TTCAAATCTA TACATTAA6A CGACTCGAAA TCCACATATC AAATATCCGA 

8350 8360 8370 8380 8390 8400 

GTGTAGTAAA CATTCCAAAA CCGTGATG6A ATGGAACAAC ACTTAAAATC 6CAGTATCCG 

8410 8420 8430 8440 8450 8460 

GAATGATTTG ATTGCCAAAA ATAGGATCTC TGGCAT6CGA GAATCTGAC6 CAGGCAGTTC 

8470 8480 8490 8500 8510 8520 

TATGCGGAAG 6GCCACACCC TTAGGTAACC CAGTAGATCC AGAGGAATTG TTTTGTCACG 

8530 8540 8550 8560 8570 8580 

CAAAGGAC TCTGGTACAA AATCGTATTC ATTAAAACCG GGAGGTAGAT GAGATGT6AC 

8590 8600 8610 8620 8630 8640 

GAACGTGTAC ATCGACTGAA ATCCCTG6TA ATCCGTTTTA GAATCCATGA TAATAATTTT 

8650 8660 8670 8680 8690 8700 

CTGGATTATT GGTAATTTTT TTT6CACGTT CAAAATTTTT TGCAACCCCT TTTTGGAAAC 

8710 8720 8730 8740 8750 8760 

AAACACTACG GTA66CTGC6 AAAT6TTCAT ACTGTT6A6C AATTaCGTT CATTATAAAT 

8770 8780 8790 8800 8810 8820 

GTCGTTC6CG GGCGCAACTG aACTCCGAT AAATAACGCG CCCAACACCG GCATAAA6AA 

8S30 8840 8850 8860 8870 888^ 

TT6AAGAGAG TTTTaCTGC ATACGACGAT TCTGT6ATTT GTATTCAGCC CATATCGTTT 

8890 8900 8910 8920 8930 8940 

CATAGCTTCr GCCAACCGAA CGGAaiTTC GAA6TATTCC GC6TAC6TGA TGTTCACCTC 

8950 8960 8970 8980 8990 9000 

GATATGT6CA TCTGTAAAA6 GAATT6TTCC AG6AACCAG6 GCGTATCTCT TCATAGCCTT 

9010 9020 9030 9040 9050 9060 

ATGCAGTTGC TCTCCAGCGG TTCCATCCTC TAGCTTTGCT TCTCAATTTC TTATTTGCAT 



9070 9080 9090 9100 9110 9120 

AAT6AGAAAA AAAGGAAAAT TAATTTTAAC ACCAATTCAG TAGTTGATTG AGCAAATGCG 
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9130 9140 9150 9160 9170 9180 

TTGCCAAAAA GGATGCTTTA GAGACAGTGT TCTCTGCACA GATAAGGACA AACATTATTC 

9190 9200 9210 9220 9230 9240 

AGAGGGAGTA CCCAGAGCT6 A6ACTCCTAA GCCAGT6AGT GGCAaGCAT CCAGGGAGAA 

9250 9260 9270 9280 9290 9300 

ATATGCTT6T CATCACCGAA 6CCTGATTCC GTAGAGCCAC ACCCTGGTAA GGGCaATCT 

9310 9320 9330 9340 9350 9360 

6CTCACACA6 GATAGA6AGG GCAGGAGCCA GGGUGAGCA TATAAGGTGA G6TAG6ATCA 

9370 9380 9390 9400 9410 9420 

GTTGCTCCTC ACATTT6CTT CTGACATAGT TGTGTT66GA GCTTGGATCG ATCCACCATG 

9430 9440 9450 9460 9470 9480 

G6CTTCAATA CCCT6ATT6A CTGGAACAGC T6TAGCCCTG AACAGCAGCG T6C6CTGCT6 

9490 9500 9510 9520 9530 ^9540 

K CGTCCGG CGATTTCCGC CTCTGACAGT ATTACCCGGA CG6TCA6C6A TATTTT66AT 

9550 9560 9570 9580 9590 9600 

AATGTAAAAA C6C6CGGTGA CGAT6CCCTG CGTGAATACA GCGCTAAATT T6ATAAAACA 

9610 9620 9630 9640 9650 9660 

GAA6TGACAG C6CTACGCGT CACCCCTGAA GAGATCGCCG CCGCCGGCGC 6CGTCTGAGC 

9670 9680 9690 9700 9710 9720 

GAC6AATTAA AACAG6CGAT GACCGCTGCC GTCAAAAATA TTGAAACGTT CCATTCCGCG 

9730 9740 9750 9760 9770 9780 

CAGACGCTAC CGCCT6TAGA TGTGGAAACC CAGCCAGGCG TGCGTTGCCA GCAGGTTACG 

9790 9800 9810 9820 9830 9840 

CGTCCCGTCT CGTCTGTCGG TCTCTATATT CCCGGC66CT CGGCTCCGCT CTTCTCAACG 

9850 9860 9870 9880 9890 9900 

c -.CTGATGC TG6CGAC6CC GGCGCGCATT GC666AT6CC AGAAG6TCGT TCTGTGCTCG 

9910 9920 9930 9940 9950 9960 

CC6CC6CCCA TCGCTGAT6A AATCCTCTAT GC6GC6CAAC TGTGT6GCGT GCAGGAAATC 

9970 9980 9990 10000 10010 10020 

TTTAACGTCG GCGGCGCGCA GGCGATTGCC GCTCTGGCCT TCGGCAGCGA 6TCCGTACC6 

10030 10040 10050 10060 10070 10980 

AAA6T6GATA AAATTTTT6G CCCC6GCAAC GCCTTTGTAA CCGAA6CCAA ACGTCAGGTC 

10090 10100 10110 lOUO 10130 19140 

A6CCAGCGTC TC6ACG6CGC GGCTATC6AT AT6CCA6CCG GGCCGTCT6A A6TACTG6T6 

10150 10160 18170 10180 10190 10200 

ATCGCAGAa GCGGCGCAAC ACC6GATTTC 6TC6CTTCTG ACCT6CTCTC CCA6GCTGAG 

10210 10220 10230 10240 10250 19260 

CACGGCCCGG ATTCCCA6GT GATCCTGCTG ACGCCTGATG aGACATTGC CCGCAAGGTG 

10270 10280 10290 10300 10310 19320 

GC66AGGCGG TAGAACGTCA ACTGGCGGAA CTGCCGC6C6 CGGACACCGC CCGGCAGGCC 

10330 10340 10350 10360 10370 10380 

CT6A6CGCCA GTC6TCTGAT TGTGACCAAA GATTTAGCGC AGTGCGTCGC CATCTCTAAT 

10390 10400 10410 10420 10430 10440 
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CAGTATGGGC CGGAACACTT AATCATCCA6 ACCCGCAATG CGC6CGATTT GGTGGATGCG 

18450 10460 10470 10480 10490 10500 

ATTACCA6CG CAGGCTC6GT ATTTCTCG6C GACTGGTCGC CGGAATCCGC CG6TGATTAC 

lOSlO 10520 10530 10540 10550 10560 

6CTTCCGCAA CCAACCATCT T7TACC6ACC TATG6CTATA CTGCTACCT6 TTCCA6CCTT 

10570 10580 10590 10600 10610 1062 0 

G66TTA6CGG ATTTCCA6AA ACG6AT6ACC GTiaGGAAC TGTCGAAA6C GGGCTTTTCC 

10630 10640 10650 10660 10670 10680 

GCTCTGGCAT CAACCATTGA AACATTGGCG GCGGCAGAAC GTCTGACCGC CaTAAAAAT 

10690 10700 10710 10720 10730 10740 

GCCGTGACCC TGCGCGTAAA C6CCCTCAA6 GAGCAAGCAT GAGCACT6AA AACACTCTCA 

10750 10760 10770 10780 10790 10800 

6C6TC6CTGA CTTAGCCCGT GAAAAT6TCC GCAACCTG6A GATCaCACA T6GATAACAT 

10810 10820 10830 10840 10850 10860 

ACATTGATGA GTTT6GACAA ACCACAACTA GAATCCAGTG AAAAAAATCC TTTATTTCTC 

10870 10880 10890 10900 10910 10920 

AAATTTGTGA TGCTATTGCT TTATTTGTAA CCATTATAA6 CTGCAATAAA CAAGTTAACA 

10930 10940 10950 10960 10970 109 80 

ACAACAATTG CATTCATTTT AT6TTTCAGG TTCAGGGGGA G6T6T6GGAG GTTTTTTAAA 

10990 11000 11010 11020 U030 11040 

GCAAGTAAAA CCTCTACAAA TGT6GTATGG CTGATTATGA TCTCTA66GC CG6CCCTCGA 

11050 U060 UO70 11080 11090 11100 

CGGCGC6CCT G6CCCCTACT AACTCTCTCC TCCCTCCnT TTCCTGCA6G CTCAAGCC6C 

11110 11120 11130 11140 11150 11160 

CaiGCCCGA CG6CGAGGAT CTCGTCGTGA CCCATGGCGA TGCCTGCTTG CCGAATATCA 

11170 11180 11190 11200 11210 11220 

TGGTGGAAAA TGGCC6CTTT TCTGGATTCA TCGACTGTGG CCGGCTGGGT GTGGCGGACC 

11230 11240 11250 11260 11270 11280 

GCTATCAGGA CATA6C6TTG GCTACCCGTG ATATTGCTGA AGA6CTTG6C 6GC6AAT6G6 

11290 11300 11310 11320 U330 11340 

CT6ACC6CTT CCTCGTCCIT TACG6TATCG CC6CTCCC6A TTC6CA6CGC ATCGCOTCT 

11350 11360 11370 11380 11390 11400 

ATCGCCTTCT TGAC6A6TTC TTCTGAGCG6 GACTCT6666 TTCGAAAT6A CCGACCAAGC 

11410 11420 11430 11440 11450 11460 

GACGCCCAAC CTGCCATCAC GAGATTTCGA TTCaCCGCC GCCTTCTATG AAAGGTTGGG 

11470 11480 11490 11500 11510 11520 

CTTCGGAATC GTTTTCCGGG ACGCCGGCTG GATGATCCTC CAGCGCGGGG ATCTCATCa 

11530 11540 11550 U560 11570 11580 

GGAt Ir e ne GCCCACCeCA ACTTGTTTAT TGaGemT AATGGTTAeA AATAAAGeAA 

U590 11600 11610 11620 11630 11640 

TAGCATeACA AATTTCAeAA ATAAAGCATT TTTTTeACTG CATTCTAGTT GTGGTTTGTe 



11650 11660 11670 11680 11690 11700 

eAAACTCATe AATCTATCTT ATeATGTCTG GATCGCGGCC GGTCTCTCTC TAGCCCTAGG 
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117li^ 11720 11730 11740 ^^11750 11760 

TCTACACTTG GCAGAAaTA TCCATCGCGT CCGCCATCTC CAGCAGCCGC AC6C6GC6CA 

11770 11780 11790 11800 11818 Uf20 

TCTCGGGCAG CGTTGG6TCC TGGCCACGG6 TGCCaiGAT CGTGCTCCTG TC6TT6AG6A 

11830 11840 11850 11860 11870 11880 

CCCG6CTA6G CTCGC6GG6T T6CCTTACTG GTTA6CAGAA T6AATCACCG ATAC6C6A6C 

11890 11900 U910 11920 11930 11940 

6AACGT6AA6 CGACTGCTGC TCaAAACGT CTGC6ACCTG AGCAACAACA TGAATGGTCT 

119S0 11960 11970 11980 11990 12000 

TC66TTTCCG TGTTTCGTAA AGTCTGGAAA C6CG6AA6TC AGCGCCCTGC ACCATTATGT 

12010 12020 12030 12040 12050 IZJfJ 

TCCGGATCTG CATCGCAG6A TGCTGCTG6C TACCCT6TGG AACACCTAa TCT6TATTAA 

12070 12080 12090 12100 12110 12120 

C6AAGCGCTG 6CATTGACCC T6AGTGATTT TTCTCTGGTC CC6CCGCATC CATACCGCa 

12130 12140 12150 12160 12170 12180 

GTTGTTTACC CJCACAACGT TCCAGTAACC GGGCAT6TTC ATCATCA6TA ACCCGTATCG 

12190 12200 12210 12220 12230 12240 

TGAGCATCCT CTCTCGTTTC ATCGGTATCA TTACCCCCAT GAAWGAAAT CCCCCTTACA 

12250 12260 12270 12280 12290 12300 

CG6AGGCATC AGTGACCAAA CAGGAAAAAA CCGCCCTTAA CATGGCCCGC TTTATCAGAA 

12310 12320 12330 12340 12350 12360 

GCCAGACATT AACGCtTCTG GAGAAACTCA ACGA6CT66A CGCGGAT6AA CAGCaCACA 

^7V7n 123S0 12390 12400 12410 1242 0 

TCT6TGAATC GCTTCACGAC CAC6CT6ATG AGCTTTACCG CAGCTGCCTC GCGCGTTTCG 

12430 12440 12450 12460 12470 12480 

6TGATGACGG TGAAAACCTC TGACACAT6C AGCTCCCGGA GACGGTCACA CCTTGTCTGT 

12490 12500 12510 12520 12530 12540 

AA6CGGATGC CGGGAGCAGA CAAGCCCGTC A6GGCGC6TC A6C6GGTGTT GGC66GTGTC 

12550 12560 12570 12580 12590 12600 

6GGGCGCAGC CATGACCCAG TCACGTAGCG ATAGCCGAGT GTATACTG6C TTAACTATGC 

12610 12620 12630 12640 12650 12660 

6CCATCA6A6 CAGATTGTAC TGA6A6TGCA CCATATCCGG TGTGAAATAC CGCAttGATG 

12670 12680 12690 12700 12710 12720 

C6TAA6GAGA AAATACC6CA TCAGGCGCTC TTCC6CTTCC TC6CTCACTG ACTCGCTGCG 

12730 12740 12750 12760 12770 12780 

CTCG6TCGTT CGGCT6CGGC GAGCGGTATC AGCTCACTCA AAGGCGGTAA TACGGTTATC 

12790 12800 12810 12820 12830 12840 

aCAGAATCA 6GGGATAAC6 CAGGAAAGAA CATGTGAGCA AAAGGCCAGC AAAAG6CCA6 

12850 12860 12870 12880 12890 12900 

GAACC6TAAA AA66CC6C6T TGCT6GCGTT TTTCaTAGG CTCC6CCCCC CT6ACGA6CA 

12910 12920 12930 12940 12950 12960 

XaCAAAAAT CGAC6CTCAA 6TCAGA66T6 GCGAAACCC6 ACAGGACTAT AAAGATACa 

12970 12980 12990 13000 13010 13020 

GGCGTTTCCC CCT66AAGCT CCCTC6TGCG CTCTCCTGTT CCGACCCTGC C6CTTACCGG 
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13030 13040 13050 13060 13070 13080 

ATACCTGTCC GCC7TTCTCC CTTCGGGAA6 CGTGGCGCTT TCTCATAGCT CACGCT6TAG 

13090 13100 13110 13120 13130 13140 

GTATCTCA6T TCGGTGTAG6 TCGTTCGCTC CAAGCTGG6C TGTGTGCACG AACCCCCC6T 

13150 13160 13170 13180 13190 13200 

TCAGCCCGAC C6CTCCGCCT TATCCG6TAA CTATC6TCTT GAGTCCAACC CG6TAA6ACA 

13210 13220 13230 13240 13250 13260 

C6ACTTATCG CCACT6GCAG CAGCaCTGG TAACAG6ATT AGCAGA6C6A 6GTATGTAG6 

13270 13280 13290 13300 13310 13320 

CGGTGCTACA GA67TCTTGA AGTGGTG6CC TAACTACGGC TACACTAGAA GGACAGTATT 

13330 13340 13350 13360 13370 13380 

TGGTATCTGC GCTCTGCTGA AGCCAGTTAC CTTCG6AAAA AGAGTT6GTA GCTCTTGATC 

13390 13400 13410 134 20 13430 13440 

. ^CAAACAA ACCACC6CTG 6TAGCGGTG6 IlllllldU T6CAAGCAGC AGATTAC6C6 

13450 13460 13470 13480 13490 13500 

CAGAAAAAAA GGATCTCAA6 AAGATCCTTT GATCTTTTCT ACG6GGTCTG ACGCTCAGTG 

13510 13520 13530 13540 13550 13560 

GAACGAAAAC TCACGTTAA6 GGAT7TTGGT CAT6AGATTA laAAAAGGA TCTTCACCTA 

13570 13580 13590 13600 13610 13620 

GATCCTTTTA AATTAAAAAT GAAGTTTTAA ATCAATCTAA A6TATATATG AGTAAACTTG 

13630 13640 13650 13660 13670 13680 

GTCTGACAGT TACCAATGCT TAATCAGTGA 6GCACCTATC TCAGCGATCT 6TCTATTTC6 

13690 13700 13710 13720 13730 13740 

TTCATCCATA 6TTGCCTGAC TCCCCGTC6T GTAGATAACT ACGATACGGG AGGGCTTACC 

13750 13760 13770 13780 13790 13B0O 

. .CTGGCCCC AGTGCTGaA T6ATACCGCG AGACCCACGC TCACCGGCTC CAGATTTATC 

13810 13820 13830 13849 13850 13860 

A6CAATAAAC CAGCCAGCCG GAA6GGCCGA GCGCAGAAGT 6GTCCTGCAA CTTTATCCGC 

13870 13880 13890 13900 13910 „HJf? 

CTCCATCCA6 TCTATTAATT GTTGCCGGGA AGCTAGA6TA AGTA6TTCGC aGTTAATAG 

13930 13940 13950 13960 13970 1 3980 

TTT6CGCAAC GTTGTTGCa TTGCTGO^GG CATC6TG6T6 TCAC6CTCGT CGTTTG6TAT 

13990 14000 14010 14020 14030 l***? 

CCCTTCATTC A6CTCC6GTT CCCAACGATC AA66CGAGTT ACAT6ATCCC CCAT6TTGTG 

14050 14«60 U979 14080 14090 , 1*1«5 

CAAAAAAGC6 6TTA6CTCCT TC6CTCCTCC GATCGTT6TC AGAA6TAAGT TGGCCGCA6T 

14110 14120 14130 14140 14150 .^^^l^lf® 

GTTATaCTC ATG6TTATGG CAGCACTGCA TAATTCTCTT AaGTCATGC CATCCGTAAG 

14170 14180 14190 14200 14210 ^.1*220 

AT6CTTTTCT GTGACTGGTG A6TACTCAAC CAAGTCATTC TGAGAATAGT GTAT6C6GCG 

14230 14240 14250 14260 14270 142M 

ACCGAGTT6C TCTT6CCC66 CGTCAACACG GGATAATACC GCGCaCATA GaGAACTTT 



14290 14300 14310 14320 14330 14340 
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AAAAGTGCTl 

^^^^^^^^ w \m w \m 


^TCATTGGAA 


AACCTTCTTC 

AM V.UI IV.I l\. 


143Se 


14360 


14370 


GTTGAGATCC 


ACTTCCATGT 




i44ie 


14420 


14430 


TTTGICCA6C 


GTTTCTGGGT 


MM 11 V#%Mf% MM V 


14470 


14480 


14490 


AA66GC6ACA 


CGGAAATGTT 


GAATArrrAT 


14530 


14540 


14550 


TTATCAGGGT 


TATTGTCTCA 


TGAGrGGATA 


14590 


14600 


14610 


AATAGGGGTT 


CCGC6CACAT 


TTCCCCGAAA 
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GG6GCGAAAA 


CTCTCAAGGA 


TCTTACCGCT 


14380 


14390 


14400 


TGCACCCAAC 


TGATCTTCAG 


aTCTTTTAC 


14440 


14450 


14460 


AGGAAGGCAA 


AATGCCCaA 


AAAA66GAAT 


14500 


14510 


14520 


ACTCTTCCTT 


TTTCAATATT 


ATTGAACaT 


14560 


14570 


145S0 


CATATTTGAA 


TGTATTTA6A 


AAAATAAAa 


14620 


14630 


14640 


A6T6CCACCT 


GACGTCTAAG 


AAACCATTAT 



14650 14660 14670 14680 14690 14700 

TATCATGACA TTAACCTATA AAAATA6GCG TATCAC6AG6 CCCTTTC6TC TTCAA6AA.. 
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10 



zo 



30 



40 



50 



TTAATTAAG6 GGCGGAGAAT GGGCGGAACT C6GC6GAGTT AGGGGCG6GA TG66CGGAGT 

70 80 90 100 lie 120 

TAGG66C66G ACTAT6GTTG CT6ACTAATT 6AGATCCAT6 CTTTGCATAC TTCT6CCT6C 

130 140 ISO 160 170 180 

TCGG6A6CCT GGGGACTTTC CAacCTGGT TGCTGACTAA TTGAGATCa TGCTTTGaT 

190 zee 210 220 230 240 

ACTTCT6CCT GCTG6G6AGC CTGGGGACTT TCCAaCCCT AACTGACACA CATTCCACAG 

250 260 270 280 290 300 

AATTAATTCC CCTAGTTATT AATAGTAATC AATTACGGGG TCATTAGTTC ATAGCCCATA 

310 320 330 340 350 360 

TATG6A6TTC C6CGTTACAT AACTTACGGT AAATGGCCCG CCTGGCTGAC CGCCCAACGA 

370 380 390 400 410 420 

:CCCCCCA TT6ACGTCAA TAATGAC6TA TGTTCCCATA GTAACGCaA TAGCGACTTT 

430 440 450 460 470 480 

•CCATTGACGT CAATGGGTGG AGTATTTACG GTAAACTGCC CACTTGGCAG TACATCAAGT 

490 500 510 520 530 540 

GTATCATATG CCAA6TACGC CCCCTATTGA CGTCAAT6AC GGTAAATGGC CCGCCTGCa 

550 560 570 580 590 600 

TTATGCCaC TACAT6ACCT TAT6GGACTT TCCTACTTGG CAGTAaTCT AC6TATTAGT 

610 620 630 640 650 660 

CATCGCTATT ACCATGGTGA T6CGGTTTTG GCAGTACATC AAT666C6T6 GATAGC6GTT 

670 680 690 700 710 720 

T6ACTCACGG G6ATTTCCAA GTCTCCACCC CATTGACGTC AATG6GAGTT TGTTTTGAAG 

730 740 750 760 770 780 

rCGCCGGC CAGCTTTATT TAACGT6TTT ACGTCGAGTC AATTGTACAC TAACGAaGT 

790 800 810 820 830 840 

6AT6AAA6AA ATACAAAAGC GCATAATATT TTGAACGACG TCGAACCTTT ATTAaAAAC 

850 860 870 880 890 900 

AAAACACAAA CGAATATC6A CAAAGCTAGA TTGCTGCTAC AAGA7TTGGC AAGTTTTGTG 

9ie 920 930 940 950 960 

6CGTTGAGC6 AAAATCaTT AGATAGTCCA GCCATC6GTT CGGAAAAACA ACCCTTGnT 

970 980 990 1000 1010 1020 

GAAACTAATC GAAACCTATT TTACAAATCT ATTGAGCATT TAATATTTAA ATTCAGATAT 

1030 ie4e leso leee ioto leso 

AAAGAC6CT6 AAAATCATTT GATTTTCGCT CTAACATACC ACCCTAAAGA TTATAAATTT 

1090 UeO 1110 1120 1130 1140 

AATGAATTAT TAAAATACAT aCCAACTAT ATATTGATA6 ACATTTCCAG TTTGTGATAT 

1150 1160 1170 1180 1190 1200 

TA6TTTCTGC GTCTCATTAC AATGGCTGTT ATTTTTAAa ACAAAaACT GCTCGCAGAC 

1210 1220 1230 1240 1250 ^1260 

AATAGTATAG AAAAGGGAGG TGAACTGTTT TT6TTTAACG GTTCGTACAA CAT7TTGGAA 



1270 1280 1290 1300 1310 1320 

AOTTATCTTA ATCCGGT6CT GCTAAAAAAT GGTGTAATTG AACTA6AAGA AGCTGCGTAC 
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1330 1340 1350 1360 1370 1380 

TATGCCGCa ACATATTCTA CAAAACC6AC GATCCCAAAT TaTTCATTA TATAAATTTA 

1390 1400 1410 1420 1430 1440 

ATAATTAAA6 CAAaaCTC CGAA6AACTA CCAGAAAATA GCACTGTTGT AAATTACA6A 

1450 1460 1470 1480 1490 1500 

AAAACTATGC GaCCGCTAC TATAaCCCC ATTAAAAAAG ACATATATAT TTATGACAAC 

1510 1520 1530 1540 1550 1560 

AAAAAATTTA CTCTATAC6A TAGATACATA TAT66ATAC6 ATAATAACTA T6TTAATTTT 

1570 1580 1590 1600 1610 1620 

TATGAGGAGA AAAAT6AAAA AGAGAAGGAA TACGAAGAAG AAGACGACAA 6GCGTCTA6T 

1630 1640 1650 1660 1670 1680 

TTAT6T6AAA ATAAAATTAT ATTGTCGCAA ATTAACTCTG AATCATTTGA AAATCATTTT ; 

1690 1700 1710 1720 1730 1740 

aaATATTACC TCAGCGATTA TAACTACGCG TTTTCAATTA TAGATAATAC TACAAATGTT 

1750 1760 1770 1780 1790 1800 

CTTGTTGC6T TTG6TTT6TA TCG7TAATAA AAAACAAATT T6ACATTTAT AATTGTT7TA 

1810 1820 1830 1840 1850 1860 

TTATTCAATA ATTACAAATA GGATTGAGAC CCTTGCA6TT GCCAGCAAAC GGAaGAGCT 

1870 1880 1890 1900 1910 1920 

TGTCGAGGAG A6TT6TTGAT TCATT6TTT6 CCTCCCTGCT 6CG6TTTTTC ACCGAAGTTC 

1930 1940 1950 1960 1970 1980 

AT6CCA6TCC AGCGTTnTG CAGCAGAAAA 6CCGCCGACT TCG6TTTGC6 GTC6C6A6TG 

1990 2000 2010 2020 2030 2040 

AAGATCCCTT TCTT6TTACC GCCAACGCGC AATAT6CCTT GCGAGGTCGC AAAATCG6C6 

2050 2060 2070 2080 2090 2100 

AAATTCCATA CCTGTTCACC 6AC6AC66CG CT6AC6CGAT CAAA6ACGCG GTGATAaTA 

2110 2120 2130 2140 2150 2160 

TCCAGCCATG CACACT6ATA CTCTTaCTC aCATGTCGG TGTACA7T6A CT6CAGCCCG 

2170 2180 2190 2200 2210 2220 

GCTAACGTAT CCACGCCGTA TTC6GTGAT6 ATAATCGGCT GATGCAGTTT CTCCT6CCA6 

2230 2240 2250 2260 2270 2280 

GCCAGAAGTT CTTTTTCaG TACOTCTCT GCC6TTTCCA AATCGCCGCT TTGGACATAC 

2290 2300 2310 2320 2330 2340 

arCCGTAAT AACGGTraC GaaGaCA raAACAGAT CGCT6ATGGT ATCG6TGT6A 

2350 2360 2370 2380 2390 2400 

GC6TC6CAGA ACATTAaiT GACGUGGTC ATC6GACGCG TCGC6TC6A6 TTTACGC6TT 

2410 2420 2430 2440 2450 2460 

GCTTCCGCCA GTGCCGC6AA ATATTCCCGT GCACCTTGCG GACGGGTATC CGGTTCGTTG 

2470 2480 2490 2500 2510 2520 

GCAATACTCC ACATCACCAC 6CTTG6GT6G TTTTTGrCAC GCGCTATCAG CTCTTTAATC 

ZS30 2540 2550 2560 2570 2580 

6CCTGTAAGT GCGCTTGCTG AGTTTCCCC6 TTGACT6CCT CTTCGCTGTA CAGTTCTTTC 

2590 2600 2610 2620 2630 2640 
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GGCTTCTTGC CCGCTTCGAA ACCAATGCCT AAA6A6A66T TAAA6CC6AC ACaGaCTT 

Z65e 2660 2678 2680 2690 "2700 

laiCAATa CCACGATGCC AT6TTCATCT GCCCA6TCGA GCATCTCTTC ACCGTAAGGG 

2710 2720 2730 2740 2750 2760 

TAATGCGAGG TACG6TAGGA GTTGGCCCCA ATCCA6TCCA TTAAT6CGTC 6TCGT6CACC 

2770 2780 2790 2800 2810 2820 

ATCA6CAC6T TATCGAATCC TTTGCCAC6C AAGTCCGCAT CTTaTGACG ACCAAAGCCA 

2830 2840 2850 2860 2870 2880 

6TAAAGTAGA ACG6TTTGTG GTTAATCAGG AACTGTTCGC CCTTCACT6C aCTGACCGG 

2890 2900 2910 2920 2930 2940 

AT6CC6ACGC GAA6C6GGTA GATATaCAC TCT6TCTG6C TTTTGGCTGT GACGaCAGT 

2950 2960 2970 2980 2990 3000 

"-'ATAGAGAT AACCTTCACC CGGTTGCCAG AGGTGCG6AT TacaCTTG CAAAGTCCCG 

3010 3020 3030 3040 3050 3060 

CTAGTGCCTT 6TCCAGTT6C AACaCCTGT TGATGCCaT CAC6CAGTTC AACGCTGACA 

3070 3080 3090 3100 3110 3120 

TCACCATTGG CCACCACCTG CCAGTaACA 6ACGCGT6GT TACAGTCTT6 CGCGACATGC 

3130 3140 3150 3160 3170 3180 

GTCACCACGG TGATATCGTC CACCCA66TG TTC6GCGTGG TGTAGAGaT TACGCTGCGA 

3190 3200 3210 3220 3230 324 0 

T66ATTCCGG CATAGTTAAA GAAATCAT6G AAGTAA6ACT GCnTTTnT 6CCGTTTTC6 

3250 3260 3270 3280 3290 3300 

TC6GTAATCA CCATTCCCGG CG6GATAGTC TGCCA6TTCA GTTCGTrGTT aCACAAACG 

3310 3320 3330 3340 ^^4?S 

"6ATACCCC TCGACGGATT AAA6ACTTCA A6CGGTCAAC TATGAAGAA6 TGTTCGTCTT 

3370 3380 3390 3400 3410 ^^^^^ JJJJ 

CGTCCCAGTA A6CTATGTCT CCA6AATGTA 6CCATCCATC CTTGTCAATC AAGGCGTTGG 

3430 3440 3450 3460 3470 3480 

TC6CTTCC66 ATTGTTTACA TAACCG6ACA TAATCATA66 TCCTCTGAa CATAATTCGC 

3490 3500 3510 3520 3530 3540 

CTCTCT6ATT AAC6CCCAGC GTmCCCGG TATCCA6ATC CACAACCTTC 6CTTCAAAAA 

3550 3560 3570 3580 3590 JgJ 

ATGGAACAAC TTTACCGACC GCGCCCGGTT TATCATCCCC CTCGG6T6TA ATCAGAATA6 

3610 3620 3630 3640 3650 3660 

CTGATGTAGT CTCAGT6A6C CCATATCCTT GTC6TATCCC TGGAAGATGG AAGCGTnTG 

3670 3680 3690 3700 3710 3720 

CAACCGCTTC CCCGACTTCT TTCGAAAGAG GT6CGCCCCC AGAA6CAATT TC6T6TAAAT 

3730 3740 3750 3760 3770 3780 

TA6ATAAATC GTATTTGTa ATCAGAGTGC TTTTGGCGAA GAATGAAAAT AGGGTTGGTA 

3790 3800 3810 3820 3830 3840 

CTAGCAACGC ACT7TGAATT TTGTAATCCT GAAGGGATCG TAAAAACACC TCTTCTTCAA 

3850 3860 3870 3880 3890 3900 

ATCTATACAT TAAGACGACT C6AAATCCAC ATATCAAATA TCCGAGTGTA GTAAACATTC 




wo 98/41645 PCTAJS98/03935 

24 / 51 

Molly 

3910 3920 3930 3940 3950 3960 

CAAAACCGTG ATGGAATG6A ACAAaCTTA AAATCCCAGT ATCC66AAT6 ATTT6ATTSC 

3970 3980 3990 4000 4010 4020 

CAAAAATA6G ATCTCTGCa T6C6A6AATC TGACCaCGC AGTTCTATGC GGAAGGGCCA 

4030 4040 4050 4060 4070 4080 

CACCCTTA66 TAACCaCTA GATCCAGA66 AATT6TTTT6 TCACGATCAA AG6ACTCTGG 

4090 4100 4110 4120 4130 4140 

TACAAAATC6 TATTaTTAA AACC666A6G TA6ATGAGAT 6TGACGAACG TGTACATC6A 

4150 4160 4170 4180 4190 4200 

CTGAAATCCC T6GTAATCC6 mTAGAATC CATGATAATA ATTTTCTGGA TTATTGGTAA 

4210 4220 4230 4240 4250 4260 

TTTTTTTT6C ACGTTaAAA TTTTTTGCAA CCCCTTTTTG GAAACAAAa CTACGGTAGG 

4270 4280 4290 4300 4310 4320 

•.CGAAATG TTCATACTCT TGA6CAATTC ACGTTaTTA TAAATGTC6T TCGCG66C6C 

4330 4340 4350 4360 4370 4380 

AACTCaACT CCGATAAATA AC6C6CCCAA CACCGGCATA AA6AATT6AA 6AGA6TTTTC 

4390 4400 4410 4420 4430 4440 

ACTGCATACG AC6ATTCTGT GATTTCTATT CAGCCCATAT CGTTTCATAG CTTCTGCCAA 

4450 4460 4470 4480 4490 4500 

CCGAACGGAC ATTTCGAAGT ATTCCGC6TA CGTGATGTTC ACCTCGATAJ GTGCATCTGT 

4510 4520 4530 4540 4550 4560 

AAAAGGAATT GTTCCA66AA CCA6GGC6TA TCTCTTCATA GCCTTATGCA GTTGCTCTCC 

4570 4580 4590 4600 4610 4620 

AaCGGTTCCA TCCTCTAaCT TTGCTTCTa ATTTCTTATT TGaTAATGA 6AAAAAAA6G 

4630 4640 4650 4660 4670 4680 

lATTAATT TTAACACCAA TTaCTAGTT GA7T6AGCAA ATGCGTT6CC AAAAAGGATG 

4690 4700 4710 4720 4730 4740 

CTTTAGAGAC A6T6TTCTCT GaCAGATAA GGACAAAaT TATTCAGAGG GAGTACCCAG 

4750 4760 4770 4780 4790 4800 

A6CT6AGACT CCTAA6CCAG TGAGTGGCAC AGCATCCAGG GAGAAATAT6 CTTGTCATa 

4810 4820 4830 4840 4850 4860 

CCGAA6CCT6 ATTCC6TAGA 6CCACACCCT 6GTAA6GGCC AATCT6CTCA aaCGATAG 

4870 4880 4890 4900 4910 4920 

A6A66GCA66 AGCCAGCCa GAGaTATAA GGTGA66TA6 GATaCTTGC TCCTaaTT 

4930 4940 4958 4960 4970 4980 

TGCTTCTCAC ATACTTCT6T T666A6CTT6 6ATC6ATCCA CCATG66CTT CAATACCCTG 

4990 5000 5010 5020 5030 5040 

ATTGACTGGA ACAGCtCTAG CCCTGAACAG CAGCGTGCGC TGCTGACGC6 TCCGGCGATT 

5050 5060 5070 5080 5090 5100 

TCCGCCTCTG ACAGTATTAC CCGGACG6TC AGCGATATTC TGGATAATGT AAAAACGCGC 

SUO 5120 5130 5140 5150 5160 

GGTGACGATG CCCTGCGTGA.ATAaGCGCT AAATTTGATA AAACAGAAGT GACAGCGCTA 



5170 5180 5190 5200 5210 5220 

cGccraccc ctgaagagat cgccgccgcc ggcgcgcgtc tgagcgacga attaaaacag 
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5230 5240 5250 5260 5270 5280 

6CGAT6ACCG CTGCCGTCAA AAATATTGAA ACCTTCCATT CCGCCaCAC CCTACCGCCT 

5290 5300 5310 5320 5330 5340 

6TA6ATGTG6 AAACCCA6CC AG6CGTGCGT T6CCAGCAGG TTACGCGTCC C6TCTCGTCT 

5350 5360 5370 5380 5390 5400 

GTCGGTCTGT ATATTCCC6G CGGCTCGGCT CC6CTCTTCT CAACGGTGCT GAT6CTG6CG 

5410 5420 5430 5440 5450 5460 

AC6CCGGC6C 6CATTGCG66 ATGCaGAAG 6T66TTCTGT GCTC6CC6CC GCCaTCGCT 

5470 5480 5490 5500 5510 5520 

GATGAAATCC TCTAT6CGGC GCAACTGTGT GGCGTGCAGG AAATCTTTAA CGTCGGCG6C 

5530 5540 5550 5560 5570 5580 

GCGCAGGCGA TTGCCGCTCT GGCCTTCGGC AGCGAGTCCG TACCGAAAGT GGATAAAATT 

5590 5600 5610 5620 5630 _ 5640 

.TGGCCCC6 GCAACGCCTT TGTAACC6AA GCCAAACGTC A6GTCAGCCA 6CGTCTC6AC 

5650 5660 5670 5680 5690 5700 

GGCGCGGCTA TCGATATGCC AGCCGGGCCG TCTGAAGTAC TGGTGATCGC AGAWGCGGC 

5710 5720 5730 5740 5750 5760 

GaACACCGG ATTTCGTCGC 7TCT6ACCTG CTCTCCCA6G aGAGCACGG CCCGGATTCC 

5770 5780 5790 5800 5810 5820 

aCGTGATCC TGCTGACGCC TGATGCT6AC ATTGCCCGCA AGGTG6CGGA GGCGGTA6AA 

5830 5840 5850 5860 5870 5880 

CGTCAACTGG CGGAACT6CC GCGCGCGGAC ACCGCCCGGC AGGCCCTGAG CGCCAGTCGT 

5890 S900 5910 5920 5930 5940 

CT6A7TGTGA CCAAAGATTT A6CGCAGT6C GTCGCCATCT CTAATCA6TA TG6GCCG6AA 

5950 5960 5970 5980 5990 6000 

JlCTTAATa TCCAGAC6CG CAATGCGCGC GAnTGGTGG ATGCGATTAC CAGCGaCGC 

6010 6020 6030 6040 6050 6060 

TC66TATTTC TCGGCGACTG GTC6CCCGAA TCCGCCG6TG ATTACGCTTC CGGAACCAAC 

6070 6080 6090 6100 6110 ^^^^.flJO 

CAT6TTTTAC CGACCTAT66 CTATACTGCT ACCTCTTCCA GCCTTGGGTT AGCG6ATTTC 

6130 6140 6150 6160 6170 6U0 

CAGAAACGGA TGACCGTTCA 6GAACTGTCG AAAGCGGGCT TTTCCGCTCT GGaTCAACC 

6190 6Zee 6210 6220 6230 6240 

ATT6AAACAT T6GCG6C66C AGAACGTCT6 ACCGCCaiA AAAATGCCGT GACCCTGCGC 

6250 6Z60 6270 6280 «290 MW 

G1AAAC6CCC TCAAGaCCA A6CATGAGGC ACTGAAAACA CTCTCA6CGT CGCT6ACTTA 

6310 63Z0 6330 6340 6350 6360 

GCCCGTGAAA ATCtCCGCAA CCTGGA6ATC CAGACATGAT AA6ATACATT GATGAGTITG 

6370 6380 6390 6400 6410 ^.^.^6420 

GACAAACCAC AACTAGAATG aGTGAAAAA AATGCTTTAT TTGTGAAATT TCTGATGCTA 

6430 6440 6450 6460 6470 6480 

TTGCTTTATT TGTAACCATT ATAA6CTGCA ATAAACAAGT TAACAACAAC AATTGCATTC 



6490 6500 6510 6520 6530 6540 
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ATTTTATGTT TaCCTTCAG GCGCAaGTGT GGGA6GTTTT TTAAAGCAA6 TAAAACCTCT 



6550 6560 6570 6580 6590 ^^'^f^ 

ACAAATGTGG TATG6CT6AT TATGATCTCT AG6GCCG6CC CTCGACGGC6 C6CCTCTAGA 

6610 66Z0 6630 6640 6650 6660 

GCAGTGTGGT TTTGCAAGAG GAA6CAAAAA GCCTCTCCAC CCAGGCCTGG AATGTTTCCA 

6670 6680 6690 6700 6710 67Z0 

CCCAATGTCG AGCAGTGT66 TTTTGaAGA GGAAGCAAAA AGCaCTCCA CCCAGGCaG 

6730 6740 6750 6760 6770 ^^^.6780 

GAATGTTTCC ACCCAATGTC GAGCAAACCC CGCCCAGCGT CTTGTCATTG GCGAATTCGA 

6790 6800 6810 6820 6830 ..^6^0 

ACACGCAGAT GCAGTCGGGG CGGCGCG6TC CCA6GTCCAC TTCGaTATT AAG6T6ACGC 

6850 6860 6870 6880 6890 6900 

'^GTGGCCTC GAACACCGAG CGACCCTGCA 6CCAATATGG GATCGGCCAT T6AACAA6AT 

6910 69Z0 6930 6940 6950 6960 

GGATT6CAC6 CAGGTTCTCC GGCCGCTT6G 6T6GA6A6GC TATTC66CTA TGACTGGCa 

6970 6980 6990 7000 7010 7020 

CAACAGACAA TCGGCTGCTC TGATGCCGCC GTGTTCCGGC TGTCAGCGa GGGGCGCCCG 

7030 7040 7050 7060 7070 ^^^W 

GTTCtnTTG TCAAGACCGA CCTGTCC6GT GCCCTGAATG AACTGCA66T AAGTGCGGCC 

7090 7100 7110 7120 7130 7140 

GTCGATGGCC 6AGGCGGCCT CGGCCTCT6C ATAAATAAAA AAAATTAGTC AGCCATGCAT 

7150 7160 7170 7180 7190 7200 

GGGGCGGAGA ATGGGCGGAA CTGGGCG6AG TTA6GGGCGG GATGGGCGGA GTTAG6GGC6 

7210 7220 7230 7240 7250 7260 

'•';actatggt tgctgactaa ttgagatgca tgctttgcat acttctgcct gctggggagc 

7270 7280 7290 7300 7310 ^. 7320 

CTGGGGACTT TCCAaCCTG 6TT6CT6ACT AATTGAGAT6 CAT6CTTTGC ATACTTCTGC 

7330 7340 7350 7360 7370 7380 

CT6CTGGGGA GCCT6G6CAC TTTCaCACC CTAACTGAa aaiTCCAC AGAATTAATT 

7390 7400 7410 7420 7430 7440 

CCCCTAGTTA TTAATA6TAA TaATTACGG GGTCATTAGT TCATAGCCa TATAT6GAGT 

7450 7460 7470 7480 7490 7500 

TCCGCGTTAC ATAACTTACG GTAAAT6GCC CGCCTG6CT6 ACCGCCCAAC 6ACCCCCGCC 

7510 7520 7530 7540 7550 7560 

aTTGACGTC AATAATGACG TAT6TTCCCA TA6TAAC6CC AATA6GGACT TTCCATTCAC 

7570 7580 7590 7600 7610 7620 

GTOATGGGT GGACTATTTA C66TAAACTG CCCACTTGGC AGTACATCAA GTGTATaTA 

7630 7640 7650 7660 7670 7680 

TGCCAAGTAC GCCCCCTATT GACGTCAATG ACGGTAAATG 6CCCGCCT66 CATTATGCCC 

7690 7700 7710 7720 7730 7740 

AGTACATGAC CTTATGGGAC TTTCCTACTT GGCAGTACAT CTACGTATTA 6TCATC6CTA 

7750 7760 7770 7780 7790 7800 

TTACCATGGT GATGC6GTTT TGGaCTACA TCAATGGGCG T6GATAGC6G TTT6ACTCAC 
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78ie 7820 7830 7840 785 0 7860 

GCGGATTTCC AAGTCTCCAC CCCATTGACG TCAATG6GA6 TTTGTTTTGG aCCAAAATC 

7870 7880 7890 7900 7910 7920 

AAC666ACTT TCCAAAATGT CGTAACAACT CCGCCCCATT GAC6CAAATG 66C6GTA6GC 

7938 7940 7950 7960 7970 7980 

GTGTAC66T6 GGAGGTCTIIT ATAA6CA6A6 CT66CTACGT GAACCGTCAG ATCGCCT66A 

7990 8000 8010 8020 8030 8040 

GACGCarCA CAGATCTCTC ACTATC6ATT TTCAG6TGCA 6ATTATCAGC TTCCTGCTAA 

8050 8060 8070 8080 8090 8100 

TCAGTGCTTC A6TCATAATG TCCAGA6GAC AAATTCTTCT CTCCCA6TCT CCAGCAATCC 

8110 8120 8130 8140 8150 8160 

TCTCTGCATC TCCA666GAG AA66TCACAA TGACTT6CA6 G6CCA6CTCA AGT6TAACTT 

8170 8180 8190 8200 8210 8220 

•ATCCACT6 6TTCCA6CAG AA6CCA6CAT CCTCCCCCAA ACCCTG6ATT TATGCCACAT 

8230 8240 8250 8260 8270 8280 

f CAACCTGGC TTCTG6AGTC CCT6TTCGCT TCAGTGGCAG T6GGTCTGG6 ACTTCTTACT 

8290 8300 8310 8320 8330 8340 

CTCTCACAAT CA6CAGA6TG GAGGCTGAAG ATGCTGCCAC TTATTACTGC CAGUGTGGA 

8350 8360 8370 8380 ..f!?® 'S? 

CTAGTAACCC ACCCACGTTC GGAG6GGGGA CaAGCTGGA AATCAAACGT ACGGTGGCTG 

8410 8420 8430 8440 8450 8460 

CACCATCTGT CTTCATCTTC CCGCaXCTG ATGAGaCTT 6AAATCT6GA ACT6CCTCTG 

8470 8480 8490 8500 8510 8520 

TTGT6TGCCT GCTCAATAAC TTCTATCCCA GA6AG6CCAA AGTACA6TGG AAGGTGGATA 

8530 8540 8550 8560 8570 ^^^^^8580 

:GCCCTCCA ATCGGGTAAC ICCaGGAGA GTGTCACAGA GaGGACAGC AAGGACAGCA 

8590 8600 8610 8620 8630 8640 

CCTACAGCCT CAGCAGCACC CTGACGCTGA 6CAAAGCAGA CTAC6A6AAA aCAAAGTCT 

8650 8660 8670 8680 8690 8700 

ACGCCreCGA AGXaCCCAT CA6G6CCT6A GCTC6CCCCT aCAAAGAGC TTCAACA66G 

8710 8720 8730 8740 8750 8760 

6A6AST6TTG AATTaCATC C6TTAAC66T TACCAACTAC CTAGACT66A TTCCTGACAA 

8770 8780 8790 8800 8810 8820 

CATGC6GCCG T6ATATCTAC GTATGAiaG CCTCGACTGT GCCTTCTAGT TGCaGCCAT 

8830 8840 8850 8860 

CT6TTGTTTG CCCCTCCCCC GTGCCTTCCT TGACCCTGGA AGGTGCCACT CCCACT6TCC 

8890 8900 8910 8920 ^._.J?JJ 

TTTCCTAATA AAATGAGGAA ATTGCATCGC ATTGTCTGAG TAGGTGXaT TCTATTCTG6 

8950 8960 8970 8980 8990 JJJJ 

GG66TGG6GT GGGGUGGAC AGCAAGGGGG AGGATTG6GA AGACAATAGC AGGCATGCTG 

9010 9020 9030 9040 9050 9060 

G6GATGCGGT GGGCTCTATG GAACCAGCTG GGGCTCGACA GCTATGCCAA GTACGCCCCC 

9070 9080 9090 9100 '^l® ^^^O 

TATTGACGTC AATGACGGTA AATGGCCC6C CTGGCATTAT GCCCAGTAa TGACCTTAT6 
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91 30 9140 9150 9160 9170 - 9180 

CGACTTTCCT ACTTGCaGT AaTCTACGT ATTAGTCATC CCTATTACCA TGGTGAT6CC 

9190 9Z00 9210 9220 9230 9240 

GTTTTGGaG TACATCAATG GGCGTGGATA GCGGTTTGAC TCACG66GAT TTCCAAGTCT 

9250 9260 9270 9280 9290 9300 

CCACCCCATT 6ACGTCAATG GGA6TTT6TT TTGGCACCAA AATCAACGG6 ACTTTCCAAk 

9310 9320 9330 9340 9350 9360 

ATGTCGTAAC AACTCC6CCC aiTGACGa AATG66C66T AG6CGTGTAC GGTG6GAGGT 

9370 9380 9390 9400 9410 9420 

CTATATAAGC A6A6CT6GGT ACGTCCTCAC ATTaGTGAT CAGaCTGAA aaGACCCG 

9430 9440 9450 9460 9470 9480 

TCGACATGGG TTGGAGCCTC ATCTT6CTCT TCCTT6TCGC TGTT6CTACG CGT6TCCT6T 

9490 9500 9510 9520 9530 9540 

CCCAGGTACA ACTCaGaG CCT6GGGCTG AGCTGGTGAA GCCT6G6GCC TCAGTGAAGA 

9550 9560 9570 9580 9590 9600 

TGTCCTGCAA GGCTTCTGGC TAaCATTTA CCAGTTACAA TATGaCTGG GTAAAACAGA 

9610 9620 9630 9640 9650 9660 

CACCTG6TC6 6GGCCTGGAA TGGATTGGAG CTATTTATCC CG6AAAT66T GATACTTCCT 

9670 9680 9690 9700 9710 9720 

ACAATCAGAA GTTCAAA6GC AAG6CCACAT TGACT6CAGA CAAATCCTCC A6CACA6CCT 

9730 9740 9750 9760 9770 9780 

ACATGaGCT aCCAGCCTG AaTCTGAGG ACTCTGCGGT CTATTACTGT GCAAGATCGA 

9790 9800 9810 9820 9830 9840 

CTTACTACGG CGGTGACTGG TACTTCAATG TCTGGGGCGC AG6GACCACG GTCACCGTCT 

9850 9860 9870 9880 9890 9900 

CTGCAGCTAG CACCAAGGGC CCATCGGTCT TCCCCCTGGC ACCCTCCTCC AA6AGCACCT 

9910 9920 9930 9940 9950 9960 

CTGG66GCAC AGCG6CCCTG GGCTGCCTGG TCAAG6ACTA CTTCCCC6AA CCSGTGACG6 

9970 9980 9990 10000 10010 10020 

T6TC6T6GAA CTCA66C6CC CTGACCA6C6 GC6TGCACAC CTTCCCGGCT 6TCCTACAGT 

10030 10040 10050 10060 10070 10080 

CCTCA6GACT CTACTCCCTC AGaGCGTGG T6ACC6T6CC CTCCAGaCC TT66GCACCC 

10090 10100 10110 10120 10130 10140 

AGACCTACAT CT6CAAC6TG AATCACAA6C CCAGCAAaC CAAGGT66AC AAGAAAGaG 

10150 10160 10170 10180 10190 10200 

A6CCCAAATC TTGTGACAAA ACTCACACAT GCCCACCGTG CCCAGaCCT GAACTCCTGG 

10210 10220 10230 10240 10250 10260 

GG6GACCGTC AGTCTTCCTC TTCCCCCCAA AACCCAAGGA CACCCTCATG ATCTCCCGGA 

10270 10280 10290 10300 10310 10320 

CCCCTGA6GT CAaTGCGTG GTGGTG6ACG TGAGCaCGA AGACCCTGAG GTCAAGTTCA 

10330 10340 10350 10360 10370 10380 

ACTGGTACGT G6ACGGCGT6 GAGGTGCATA ATGCCAAGAC AAAGCCGCGG GAGGAGCAGT 



10390 ie400 10410 10420 10430 10440 
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ACAAaCac GTACCGTGTG GTCACCGTCC TaCCCTCCT CacaCCAC TG6CTCAATC 

10450 10460 10470 10480 10490 10500 

GCAA66AGTA CAA6TGCAA6 GTCTCCAACA AAGCCCTCCC AGCCCCCATC GAGAAAACa 

10510 10520 10530 10540 10550 10560 

TCTCCAAAGC CAAAGGGaG CCCCGAGAAC CAaGGTGTA aCCCTGCCC CCATCCCGG6 

10570 10580 10590 10600 10610 10620 

AT6AGCTGAC CAAGAACCAG GTCAGCCTGA CCT6CCTGGT CAAAGGCTTC TATCCCAGCG 

10630 10640 10650 10660 10670 10680 

AaTCGCCGT GGAGTG6GAG AGCAATG6GC AGCCG6AGAA CAACTACAAG ACaCGCCTC 

10690 10700 10710 10720 10730 10740 

CC6T6CTG6A CTCC6AC66C TCCTTCTTCC TCTACAGaA GCTCACC6TG GAaAGACa 

10750 10760 10770 10780 10790 10800 

••TTCGaGCA GGGGAACGTC TTCTCATGCT CCGTGATGCA TGA6GCTCTG CACAACCACT 

10810 10820 10830 10840 10850 10860 

ACACGaCAA GAGCCTCTCC CTGTCTCCGG GTAAATGAGG ATCCGTTAAC GGTTACCAAC 

10870 10880 10890 10900 10910 10920 

TACCTAGACT GCATTCGTGA CAACATGC6G CCGTGATATC TACGTATGAT CAGCCTCGAC 

10930 10940 10950 10960 10970 10980 

T6T6CCTTCT AGTT6CCAGC CATCT6TT6T TT6CCCCTCC CCCCTGCCTT CCTTGACCCT 

10990 11000 11010 11020 11030 11040 

GGAA6GT6CC ACTCCCACTG TCCTTTCCTA ATAAAAT6A6 GAAATTGaT CGCATTGTCT 

11050 U060 11070 UO80 11090 11100 

GA6TAGGTGT CATTCTATTC TGGG6GGTGG GGTCGGGCAG GACAGCAAGG GGGAG6ATT6 

11110 11120 11130 11140 11150 11160 

'«U^AGACAAT AGCAGGCATG CT66GGATGC GGTGG6CTCT ATG6AACCAG CTGGGGCTCG 

11170 11180 m90 11200 U210 11220 

ACAGaACGC TAG6TCGACC CCGCTACTAA CTCTCTCCTC CCTCCTTTTT CCTGCAGGAC 

11230 11240 11250 11260 11270 11280 

GACCaCCGC GGCTATC6TC GCTG6CCAC6 ACGGCCGTTC CTTGCGCAGC TGT6CTCGAC 

U290 11300 11310 11320 11330 11340 

GTTGTaCTG AA6CGG6AA6 66ACT66CTG CTATTGGGCG AA6TGCCGGG GaGGATCTC 

11350 11360 11370 11380 11390 11400 

CT6TCATCTC ACCTTGCTCC TGCCGAGAAA 6TATCCATCA T6GCTGATGC AATGCGGCGG 

11410 U420 11430 U440 11450 11460 

CreaTACGC TTGATCC66C TACCTGCCa TTCGACCACC AAGCGAAAa TCGaTCGAG 

11470 11480 U490 11500 11510 11520 

CGAGCACGTA CTC6GAT66A A6CCG6TCTT GTCGATaCG AT6ATCTGGA CGAA6AGCAT 

11530 11540 11550 11560 11570 11580 

CA66G6CTC6 C6CCAGCC6A ACTGTTCGCC AGGTAAGTGA GCTCCAATTC AAGCTTCCTA 

11590 U60e 11610 11620 U630 U64e 

G66CGGCCAG CTAGTAGCTT TGCTTCTCAA TTTCTTATTT GCATAATGAG AAAAAAAG6A 



U650 11660 11670 U680 U69e 11700 

AAATTAATTT TAACACCAAT TCAGTAGTT6 ATTGAGCAAA TGCGTT6CCA AAAAG6ATGC 
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11710 U72e 11730 11740 11750 11760 

TTTAGAGACA GTGTTCTCTG CACACATAAG GACAAACATT ATTCA6AGGG AGTACCCAGA 

11770 U78e U790 11800 U810 118Z0 

GCT6AGACTC CTAAGCCAGT GAGTGGCAa GCATCCA6G6 AGAAATAT6C TTGTCATCAC 

U830 11840 11850 U860 11870 U880 

C6AAGCCTGA TTCCGTAGAG CCAaCCCTG GTAA666CCA ATCTGCTaC AaCGATAGA 

11890 11900 11910 11920 11930 11940 

GA6GGCAGGA GCCAG6GCAG AGCATATAAG GT6AGGTAG6 ATCAGTT6CT CCTCACATTT 

11950 11960 11970 11980 U990 U000 

GCTTCTGACA TAGTT6TGTT GGGAGCTTGG ATAGCTT6GG GGGGGGAaC CTaCGGCTG 

12010 12020 12030 12040 12050 12060 

CGATTTCCCG CCAAAOTGA CGGCAATCCT AGCGTGAAGG CTGGTAGGAT TTTATCCCCG 

12070 12080 12090 1Z100 12110 12120 

GCCATar GGTTCGACCA TTGAACTGCA TC6TCGCCGT 6TCCCAAAAT AT6G6GATT6 

12130 12140 12150 12160 12170 12180 

jSCAAGAACGG AGACCTACCC TGGCCTCCGC TCAG6AAC6A GTTCAAGTAC TTCCAAAGAA 

12190 12200 12210 12220 12230 12240 

TGACCACAAC CTCTTCAGTG GAA6GTAAAC AGAATCTGGT 6ATTATGG6T AGGAAAACCT 

12250 12260 12270 12280 12290 12300 

GGTTCTCCAT TCCTGAGAA6 AATCGACCTT TAAAGGACAG AATTAATATA GTTCTaGTA 

12310 12320 12330 12340 12350 12360 

GAGAACTCAA AGAACCACCA CGAGGAGCTC Al I i ILI IGC CAAAA6T7TG GATGATGCCT 

12370 12380 12390 12400 12410 12420 

TAAGACTTAT TGAACAACCG GAATTGGCAA GTAAA6TAGA CATGGTTTGG ATAGTCG6AG 

12430 12440 12450 12460 12470 12480 

AGTTCTGT TTACCAG6AA GCaiGAATC AACCAGGCCA CCTCAGACTC TTTGTGACAA 

12490 12500 12510 12520 12530 12540 

6CATCAT6CA GGAATTTGAA AGTGACACGT TTTTCCCAGA AATTGATTTG G6GAAATATA 

12550 12560 12570 12580 12590 12600 

AACTTCTCCC AGAATACCCA GGCGTCCTCT CTGAGGTCCA GGAGGAAAAA GCaTCAAGT 

12610 12620 12630 12640 12650 12660 

ATAAGTTTGA AGTCTACGAG AA6AAAGACT AACAG6AAGA T6CTTTCAA6 TTCTCTGCTC 

12670 12680 12690 12700 12710 12720 

CCCTCCTAAA GCTAT6CATT TTTATAAGAC CATG66ACTT TTGCT66CTT TAGATCA6CC 

12730 12740 12750 12760 12770 12780 

TCGACT6T6C CTTCTAGTTG CCAGCCATCT GUfalll GCC CCTCCCCCGT GCCTTCCTTG 

12790 12800 12810 12820 12830 12840 

ACCCTG6AAG GTGCCACTCC aCTGTCCTT TCCTAATAAA ATGAGGAAAT TCaTCGCAT 

12850 12860 U870 12880 U890 12900 

TGTCTGAGTA GGT6TCATTC TATTCTGGG6 6GTGGGGTGG GGCAGGACAG CAAGGG6GAG 

12910 12920 12930 12940 12950 12960 

6ATT6GGAAG ACAATAGCAG GCATGCTGGG 6AT6CGGTGG GCTCTATGGC TTCT6AGGCG 

12970 12980 12990 13000 13010 13020 

6AAA6AACCA GCTGG66CTC GAAGCGGCC6 CCCATTTCGC TGGT6GTCAG ATGC666ATG 
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13030 



13040 



13050 



13060 



13070 



13080 



GCGTGG6ACG CGGCGGG6AG CGTaCACTG AGGTTTTCCG CCAGACGCCA CT6CTGCCAG 

13090 13100 13110 13120 13130 13140 

6CGCTGATGT GCCCG6CTTC TGACCATGC6 GTCGCGTTCG GTTGaCTAC GC6TACTGT6 

13150 13160 13170 13180 13190 13200 

AGCCAGAGTT GCCC6GC6CT CTCC66CTGC 6GTAGTTCAG 6CAGTTCAAT aACTGTTTA 

13210 13220 13230 13240 13250 13260 

CCTT6TGGAG C6ACATCCAG A6GCACTTCA CCGCTTGCa GC66CTTACC ATCaCCGCC 

13270 13280 13290 13300 13310 13320 

ACCATCCAGT GCA6GAGCTC GTTATCGCTA TGACGGAACA GGTATTC6CT GGTaCTTCG 

13330 13340 13350 13360 1337 0 13380 

AT6GTTTGCC CGGATAAACG GAACTGGAAA AACTGCT6CT GGTGTrTTGC TTCCGTaGC 

13390 13400 13410 13420 13430 13440 

u .GGATGCG GC6TGCGGTC GGCAAAGACC AGACCGTTCA TAaGAACTG GC6ATC6TTC 

13450 13460 13470 13480 13490 13500 

GGCGTATCGC CAAAATCACC GCCGTAA6CC 6ACCACGGGT TGCCuii iiC ATCATATTTA 

13510 13520 13530 13540 13550 13560 

ATCAGCGACT GATCCACCa GTCCCAGACG AA6CCGCCCT GTAAACGGGG ATACTGACGA 

13570 13580 13590 13600 13610 13620 

AACGCCTGCC AGTATTTAGC GAAACCGCCA AGACTGTTAC CCATCGCGTG GGCGTATTCG 

13630 13640 13650 13660 136 70 13680 

CAAAGGATCA GCGG6CGCGT CTCTCCAGGT AGCGAAAGCC ATTTTTTGAT GGACaTTTC 

13690 13700 13710 13720 13730 13740 

GGCACAGCCG GGAAGGGCTG GTCTTCATCC ACGCGCGCGT ACATC6GGCA AATAATATCG 

13750 13760 13770 13780 13790 13800 

\,.66CCGT66 TGTCGGCTCC GCCGCCTTCA TACTGaCCG GGCG6GAAGG ATC6ACAGAT 

13810 13820 13830 13840 13850 ^"W® 

TT6ATCCA6C GATACAGCGC GTCGTGATTA GC6CCGT6GC CTGATTCATT CCCaGCGAC 

13870 13880 13890 13900 13910 13920 

CA6ATGATCA CACTCG66TG ATTACGATCG CGCTGCACCA TTCGCGTTAC GCGTTCGCTC 

13930 13940 13950 13960 13970 13980 

ATCGCCG6TA GCCA6C6CGG ATaTCGGTC AGACGATTCA TTGGaCCAT 6CCGTG6GTT 

13990 14000 14010 14020 14030 14040 

TCAATATTGG CTTCATCCAC CACATACAGG CCGTAGCGGT CCaCAGCGT GTACaaCC 

14050 14060 14070 14080 14090 14100 

66ATGGTTC6 GATAAT6C6A AaGCGCACG GCGTTAAAGT T6TTCT6CTT CATaGCAGG 

14110 14120 14130 14140 14150 It"? 

ATATCCTGCA CCATCGTCT6 CTCATCCAT6 ACCTGACCAT CaCAGGATG ATGCTCGTGA 

14170 14180 14190 14200 14210 14220 

CGCTTAACGC CTCGAATCAG CAACGGCTTG CCGTTCAGCA GCAGCAGACC ATTTTCAATC 

14230 14240 14250 14260 14270 14280 

CGCACCTCGC G6AAACC6AC ATC6CAGGCT TCTGCTTCAA TCA6C6TGCC CTCGGCGGTG 



14290 



14300 14310 14320 14330 14340 
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TGCAGTTCAA CCACC6CACG ATAGAGATTC GGGATTTCCa CGCTCCACAG TTTC6G6TTT 

14350 14360 14370 14380 14390 J^J 

TCGACGTTa 6ACGTA6T6T 6ACGCGATC6 GCATAACCAC CACGCTaTC 6ATAATTTCA 

14410 14420 14430 14440 14450 14460 

CCGCC6AAAC GC6CG6T6CC GCTG6CGACC TGC6TTTCAC CCTGCaTAA AGAAACTGTT 

14470 14480 14490 14500 14S10 14SZ0 

ACCCGTA66T A6TCACGCAA CTC6CCGCAC ATCT6AACTT CAGCCTCaC TAaCCGCGG 

14530 14540 14550 14560 1«70 .^^^JJSM 

CT6AAATCAT CATTAAA6CG AGTGGCAACA TGGAAATCGC TGATTTGT6T AGTCG6TTTA 

14590 14600 14610 146Z0 14630 14640 

TGCAGCAACG AGACGTCACG GAAAATGCCG CTCATCCGCC AaTATCCTG ATCTTCaGA 

14650 14660 14670 14680 14690 14700 

TAACTGCCGT CACTCCAGCG aGaCCATC ACCGCGAGGC 66TTTTCTCC GGC6C6TAAA 

14710 14720 14730 14740 14750 14760 

AATGC6CTCA GGTCAAATTC. AGAC6GCAAA CGACTGTCCT GGCC6TAACC GACCCAGC6C 

14770 14780 14790 14800 ,14810 IJJJJ 

CCGTTGCACC ACAGAT6AAA CGCC6AGTTA ACGCCATCAA AAATAATTC6 CGTCTGGCCT 

14830 14840 14850 14860 ^.^^^O 14880 

TCCTGTAGCC AGCTTTCATC AACATTAAAT GTGAGCGAGT AACAACCCGT CG6ATTCTCC 

14890 14900 14910 14920 ^^^JJ^J r^rrrJJ??! 

GTGGGAACAA ACGGCGGATT GACCGTAATG GGATAGGTGA CCTT66TGTA 6AT6G6CGCA 

14950 14960 14970 14980 "JJJ .rrAA^SS 

TCGTAACCGT GCATCTGCCA 6TTTGAGGGG ACGACGACAG TATCGGCCTC AGGAAGATCG 

15010 15020 15030 15040 15050 ^^^J®* 

'•ACTCCA6CC A6CTTTCCGG aCCGCTTCT G6TGCCGGAA ACCAGGCAAA GCGCCATTCG 

15070 15080 15090 15100 ^^^15110 

carrcAGGC tgcgcaactg ttgggaaggg cgatcggtgc gggcctcttc gctattacgc 

15130 15140 15150 15160 15170 1S18 0 

CA6CTGGCGA AAG6GGGATG TGCTGCAAGG CGATTAAGTr GGGTAACGCC AGGGTTTTCC 

15190 15200 15210 15220 15230 ,|:f JJJ 

CA6TCACGAC GTT6TAAAAC GACTTAATCC GTCGAGGGGC T6CCTCGAAG aGACGACCT 

15250 15260 15270 15280 15290 15300 

TCCGTTGT6C A6CCAGC6GC GCCTGCGCCG GTGCCaCAA TCGTGCGCGA ACAAACTAAA 

15310 15320 15330 15340 

CaGAACAAA TTATACC6GC GCaCCGCCG CCACCACCTT CTCCCGTGCC TAACATTCCA 

15370 15380 15390 15400 15410 JJJZO 

6C6CCTCCAC CACCACCACC ACCATCGATG TCTGAATTGC CGCCCGCTCC ACCAATGCCG 

15430 15440 15450 15460 _ 1S470 «JJJ 

AC6GAACCTC AACCC6CTGC ACCTTTAGAC GACAGACAAC AATTGTTGGA AGCTATTAGA 

15490 15500 15510 15520 15530 15540 

AACGAAAAAA ATCGaCTCG TCTCAGACC6 6TCAAACCAA AAACGGCGCC CGAAACCA6T 

15550 15560 15570 15580 ^^.^ISS?® .^^^^J^S? 

ACAATAGTTG AGGTGCCGAC TGTGTTGCCT AAAGA6ACAT TTGA6CCTAA ACCGCC6TCT 
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.utaiS? cAccfcS? wcTcilfc? cccc^cS? ccccai^ .CCTCSSI 

»™«Ti2l?? nTC»??SS TCCCSSS CCaC^T?!? TA«1^?S? CTCTcSS? 

4e-»9im ic7Aa i^T^a 1S760 1S770 1S7S0 

TT»C(i?S? CTCCA^SS CCTTwS? CTGTTCTCTt .ATT.»»ATC MSaCtTT 

a«ttSS? cccccSK! »«ccmSJ Ta«Sm irccilSS acuSS 
aatttS'tIS ca«cS'=?? .cccJSIS attu^IU? ac^« ta«.SS 
TcercmS aacoJ^ taacJUSI ccttoISS acsacUSJ tcmotS? 

A«SS C«Ccg?S TAAATA^ CA^C^SS CTCC^S" TTCcirc 
ATATCS??? C^TCCJ^SI? aCCA^SI! CCCaS =CATciSJ? a«ci?SS 

'rccTcilS? cccTcJ^S? «TC^?S? a=Tcl?^ c«ca!iS AacniSTc 

iciea lAim ifil70 16188 16190 16200 

tSroimJ CTljnSiS OAATtiSS cc«atac«c« akoacctc aacccactcc 
TCCTcSS! C^Tcifg? CT«A«SfcJ .O..^ TCTTciP?? CCCT^« 
TAAA^ AAAC^JS; ^^^C KCAcSS!! TCTTci'aS? CTCCA^SS 
CAT.J?!?? «aAi!l?S TCUAiS'cl? .CA^ TAAcS CT.«^^ 

ccc:^^^ rurr'^. .^cc^^ '^rcak^' ^^'^^ ^^^"^ 

TTCAtSS! T«T,i?ll'c CAT^iSS! AATCcillt? .aciSlfc ATCA«^S?C 

AAACAiSS AAACCJS'C? TAAa^JI!? Cacn^!??!! ..ACciTci? ATTAAilg? 

C™«SS? TCAAcS?^ «aC=?.S? OAACAillSS A«TC^SS AK.cS^S? 

«CCAi1S? ATCA.S?J C«aiS? CTCCcilS? T».nS?« CCO^C 

4 A 16790 16800 

CmaSS TCCA.S??C 0««iSS? ACA.^"? TCTAAi«« T«CC««.C 

A^ACAifci? CTCArJ? CTCA.iSS m«i?cS ^CCciSI! AOCa^Sfc 

««TcilJS ac^AxifcS A^cri^n? ^cttIS!?! tcc«S?S ».caU??? 
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16930 16940 16950 16960 16970 16980 

TACTGAGAGT GCACaTATG CG6T6T6AAA TACCGCACAG AT6CGTAAGG AGAAAATACC 

16990 17000 17010 17020 17030 17040 

CaTCAGGCG CTCTTCCGCT TCCTCGCXa CTGACTC6CT GC6CTCG6TC GTrCGGCTGC 

17050 17060 17070 17080 17090 17100 

G6C6A6CGGT ATaGCTCAC TaAAGGCGG TAATACGGTT ATCCACAGAA TaGGGGATA 

17110 17120 17130 17140 17150 17160 

ACGCAGGAAA GAACATGTGA GaAAAGGCC AGCAAAA6GC CAGGAACCGT AAAAAGGCCG 

17170 1 7180 17190 17200 17210 17220 

CGTrGCTGGC GTTTTTCCAT AGCCTCCGCC CCCCTGAC6A CaiaCAAA AATC6AC6CT 

17230 17240 17250 17260 17270 17280 

CAAGTCA6AG GTGGCGAAAC CCGACAGGAC TATAAA6ATA CCAGGCGTTT CCCCCTGGAA 

17290 17300 17310 17320 17330 17340 

6CTCCCTCGT GC6CTCTCCT GTTCCGACCC TGCCGCTTAC CGGATACCTG TCCGCCTTTC 

17350 17360 17370 17380 17390 17400 

TCCCTTCGGG AA6CGTGGCG CTTTCTCATA GCTCACGCTG TAGGTATCTC AGTTCGGTGT 

17410 17420 17430 17440 17450 17460 

; A6GTCGTTCG CTCaAGCTG GGCTGTGTGC ACGAACCCCC CGTTCAGCCC GACCGCTGCG 

17470 17480 17490 17500 17510 17520 

CCTTATCCGG TAACTATCGT CTTGAGTCCA ACCCGGTAA6 AaCGACTTA TCGCaCTGG 

17530 17540 17550 17560 17570 17580 

aCCAGCCAC TG6TAACAGG ATTAGCAGAG CGAGGTATGT AGGCGGTGCT ACAGAGTTCT 

17590 17600 17610 17620 17630 17640 

TGAAGTGGTG GCCTAACTAC GGCTACACTA GAAGGACAGT ATTTGGTATC T6CGCTCTGC 

17650 17660 17670 17680 17690 17700 

TGAAGCCAGT TACCTTCGGA AAAAGAGTT6 GTAGCTCTTG ATCCGGCAAA aAACaCCG 

17710 177 20 17730 17740 17750 17760 

CTGGTAGCGG TG67TTTTTT GTTTGCAA6C AGaGATTAC CCGCA6AAAA AAA66ATCTC 

17770 17780 17790 17800 17810 17820 

AA6AA6ATCC TTT6ATCTTT TCTACGGGGT CTGACGCTa 6TGGAACGAA AACTCACGTT 

17830 17840 17850 17860 17870 17880 

AAGGGAmr GGTaTGAGA TTATCAAAAA GGATCTTCAC CTAGATCCTT TTAAATTAAA 

17890 17900 17910 17920 17930 17940 

AAT6AAGTTT TAAATOMTC TAAAGTATAT AT6A6TAAAC TTG6TCTGAC A6TTACCAAT 

17950 17960 17970 17980 17990 18000 

GCTTAATCAG TGAGGCACCT ATCTCAGCGA TCTGTCTATT TC6TTCATCC ATA6TTGCCT 

18010 18020 18030 18040 18050 18060 

6ACTCCCCGT CGTGTAGATA ACTACGATAC GGGAG6GCTT ACCATCT66C CCaCTGCTG 

18070 18080 18090 18100 18110 18120 

CAATGATACC GCGAGACCa CGCTaCCGG CTCCAGATTT ATCAGCAATA AACCAGCCAG 

18130 18140 18150 18160 18170 18180 

CCG6AAGGGC CGAGCGCAGA AGTGGTCCTG CAACTTTATC C6CCTCCATC CAGTCTATTA 



18190 18200 18Z10 



18220 18230 



18240 
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ATT6TTGCCG 


GGAA6CTAGA 


6TAA6TAGTT 


CGCCAGTTAA 


TAGTTT6CGC 


AAC6TTGTT6 


18250 


18260 


18270 


18280 


18290 


18300 


CCATTGCTGC 


AGGaTCGTG 


GTGTaCGCT 


C6TCGTTT6G 


TAT66CTTCA 


7TCA6CTCC6 


18310 


18320 


18330 


18340 


18350 


18360 


GTTCCaACG 


ATCAAGGCGA 


GTTACATGAT 


CCCCCAT6TT 


GTGCAAAAAA 


GC6GTTA6CT 


18370 


18380 


18390 


18400 


18410 


18420 


CCTTCG6TCC 


TCCGATC6TT 


GTaGAAGTA 


AGTTGGCCGC 


A6T6TTATCA 


CTCAT6GTTA 


18430 


18440 


18450 


18460 


18470 


18480 


TGGCAGCACT 


GCATAATTCT 


CTTACTGTCA 


TGCCATCCGT 


AAGATGCTTT 


TCTGTGACTG 


18490 


18500 


18510 


18520 


18530 


18540 


GT6AGTACTC 


AACCAAGTCA 


TTCTGACAAT 


AGTGTAT6CG 


GCGACCGAGT 


TGCTCTT6CC 


18SS0 


18560 


18570 


18580 


18590 


18680 


-';CCGTCAAC 


ACG66ATAAT 


ACCGCGCCAC 


ATAGCAGAAC 


TTTAAAAGTG 


CTCATCATTG 


18610 


18620 


18630 


18640 


18650 


18660 


GAAAACGTTC 


TTCGGG6CGA 


AAACTCTCAA 


6GATCTTACC 


GCTGTTGAGA 


TCCAGTTCGA 


• 

18670 


18680 


18690 


18700 


18710 


18720 


TGTAACCCAC 


TCGTGCACCC 


AACT6ATCTT 


CAGCATCTTT 


TACTTTCACC 


AGCGTTTCTG 


18730 


18740 


18750 


18760 


18770 


18780 


6GTGA6CAAA 


AACAGGAAGG 


CAAAATGCCG 


CAAAAAAGGG 


AATAAGGGCG 


ACACGGAAAT 


18790 


18800 


18810 


18820 


18830 


18840 


6TT6AATACT 


CATACTCTTC 


Cll IIICAAT 


ATTATTGAAG 


aTTTATCAG 


6GTTATTGTC 




XooOV 


xBBrV 


loooU 




18QOO 


TCATGA6CGG 


ATACATATTT 


GAATGTATTT 


AGAAAAATAA 


ACAAATAGGG 


GTTCCGCGCA 


18910 


18920 


18930 


18940 


18950 


18960 


'MTTCCCCG 


AAAAGTGCCA 


CCTGACGTCT 


AA6AAACCAT 


TATTATCAT6 


AaTTAACCT 


18970 


18980 


18990 


19000 


19010 


19020 


ATAAAAATAG 


GCGTATCACG 


AGGCccnrc 


GTCTTCAAGA 
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FIGURE 9 



Pac 1(5) 




PshA l(9iao| 



Onilll(i4Qai'^ 




BamH Itionq^ 
Sma 1(10594)' 

Sac 11(10585) « 



BaiW 1(7528) 

EeoR i(WO) 



Nha l(58<9 



Ht D «■ Izxaetlv« Oihydroi^oljite rednctase SO « SV40 Origin of repH cation 

S " QC7 and SV40 rahaaoara 

Vt B - Znaotiw SaBoaalla Biatidinol Dafaydrograaae 

T «■ Bazpea flliiplax tfayaidinia kinaa pronotar and polyoaui enhancer 

C » Cytonagalovlnra pronatar/eahaneer B » Bovine growth hozMne polyadenylation 
HI ■ W «o my ain phoaphotrana£eraae esoa 1 M2 > Noomycin phoaphotranaferaae axon 2 
S ■ Boman lr«|Tpa conatant . Ol « Hnman Gaana 1 constant 

VL » Variable light chain anti*CD23 prijaate 5£8 and leader 
VR » Variable heavy chain anti*GD23 primate SESK* and leader 



Mandy cut Xbal Xho I and llgatad to Xba I Xho 1 fragment from XKG1«fCD23 5E8N-SHL 
Map by Mitohall Raff Conatrudod by Karon MoLaohlan OaM/97 19,035 bp 
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10 


20 


30 


TTAATTAAGG 


GGCGGAGAAT 


GGGCGGAACT 


70 


80 


90 


TA6GGGC6GG 


ACTATGGTT6 


CTGACTAATT 


130 


140 


ISO 


TGGG6A6CCT 


GG6GACTTTC 


CACACCT6GT 


190 


ZOO 


210 


ACTTCT6CCT 


GCTGG6GA6C 


CTG6GGACTT 


Z50 


260 


270 


AATTAATTCC 


CCTA6TTATT 


AATA6TAATC 


310 


320 


330 


TATG6AGTTC 


CGC6TTACAT 


AACTTACGGT 


370 


380 


390 


:CCCGCCCA 


TTGACGTCAA 


TAATGACGTA 


430 


440 


450 


CCATTGAC6T 


CAATGGGTGG 


A6TATTTACG 


490 


500 


510 


GTATCATATG 


CCAAGTACGC 


CCCaATTGA 


550 


560 


570 


TTATGCCCAG 


TACATGACCT 


TATGGGACTT 


610 


620 


630 


CATCGCTATT 


ACCATGGTGA 


TGCGGITTTG 


670 


680 


690 


TGACraCGG 


GGATTTCCAA 


GTCTCCACCC 


730 


740 


750 


/6TTTAAAC 


AGCTTGGCCG 


GCCAGCTTTA 


790 


800 


810 


ACTAACGACA 


GT6ATGAAAG 


AAATACAAAA 


850 


860 


870 


TTATTACAAA 


ACAAAACACA 


AACGAATATC 


910 


920 


930 


fiCAAGTrTTG 


TGGCGTTGAG 


CGAAAATCCA 


970 


980 


990 


CAACCCTTCT 


TT6AAACTAA 


TC6AAACCTA 


1030 


1040 


1050 


AAATTaGAT 


ATAAAGAC6C 


TGAAAATaT 


1090 


1100 


lUO 


GATTATAAAT 


TTAATGAATT 


ATTAAAATAC 


1150 


1160 


1170 


AGTTTGTGAT 


ATTA6TTTGT 


GCGTCTCATT 


1210 


1220 


1230 


CT6CTCGCAG 


ACAATAGTAT 


AGAAAAGGGA 
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•10 




40 


50 


60 


GGGCGGAGTT 




1 vIlUCIlvAV 1 


100 


110 


120 


GAGATGCATCi 


/•III rrATAT 




160 


170 


180 


TuCTGACTAA 


1 1 uAllA i UVpA 


TCrTTTCrAT 
1 Ill litJi 1 


2Z0 


230 


240 


TCCACACLCI 


AA\> 1 uAw%W» 


fATTrrAf Ai: 

WA 1 1 \»w%iJlv 


Z80 


Z90 


300 


AATTACuuvw 


1 \JK 1 1 Au lie 


ATAcrrrATA 

A 1 A vV.i.LA 1 A 


340 


350 


360 


AAATGQCCCu 




fdrrckkriik 


400 


410 


420 


TGTTCCCATA 


f^k A c^rrkk 
u 1 AALUwLAA 


TA ittick nil 

1 AvIauAL 1 1 1 


460 


470 


480 


GTAAACTGCC 


CACTTQGCAm 


TA r A Tr A A /nr 
i AwA 1 LAAliT 


520 


530 


540 


CGTGVATGAC 


GGTAAATGGC 


CwbCCTbUCA 


580 


590 


600 


TCCTACTTG6 


CAGTACATCT 


A C TTA TTA CT 
ALU lA 1 1 Av 1 


640 


650 


660 


GulGTACATC 




r'ATA rrrt' tt 
(jA I Alavuu 1 1 


700 


710 


720 


UlTTGACuTC 


AA 1 laliCiAu 1 1 


TC Mil CAAC 
1 u 1 1 1 1 liAAv 


760 


770 


780 


TTTAACuTGT 


TTA rCTCtlk d 
\ 1 AWu 1 UuAlJ 


Tr A ATTCTA f 

1 ^AA 1 1 U IWU 


820 


830 


840 


liCuLA lAA 1 A 


1 1 1 1 vAA WUA 




880 


890 


900 


^ A /* A A A ^ A 

GACAAAULTA 


uA 1 1 1 UV. 1 


ATAACATTTC 


940 


950 


960 


TTA /• A TA 

TTAGA lAu 


TAcrrATrcc 

LAIiUwl 1 vvU 


TTCCCAAAAA 


1000 


1010 


1020 


I 1 1 I A ^A A AT 

TTTTACAAA 1 


rTA TTC A n n A 
W 1 Ai 1 liAuttA 


TTTAATATTT 

III MM 1 M 1 1 1 


lose 


1070 


1080 


TT6ATTTTC6 


rxrxA A /• ATA 
1, 1 w 1 AA VA 1 A 


rrAf rrTAAA 

LwlULU 1 AAA 


1120 


1130 


1140 


ATacaACT 


ATATATTGAT 


AGACATTTCC 


1180 


1190 


1200 


ACAATGCCTG 


TTATTTTTAA 


aACAAACAA 


1240 


1250 


1260 


GGTGAACTGT 


TTTT6TTTAA 


C6GTTCGTAC 



1270 1280 1290 1300 1310 1320 

AACATTTT6G AAAGTTATGT TAATCCGGTG CTGCTAAAAA ATGGTGTAAT TGAACTAGAA 
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1330 1348 1350 1360 1370 1380 

GAACCTGC6T ACTATGCCGG CAACATATT6 TACAAAACCG ACGATCCCAA ATTaTTGAT 

1390 1400 1410 1420 1430 1440 

TATATAAATT TAATAATTAA AGCAACACAC TCCGAA6AAC TACCAGAAAA TA6CACT6TT 

1450 1460 1470 1480 1490 1500 

GTAAATTACA GAAAAACTAT GCGCAGCGGT ACTATACACC CCATTAAAAA AGAaTATAT 

1510 1520 1530 1540 1550 1560 

ATTTAT6ACA AaAAAAATT TACTCTATAC 6ATAGATACA TATATG6ATA CGATAATAAC 

1570 1580 1590 1600 1610 1620 

TATGTTAATT TTTATGAGGA GAAAAATGAA AAAGAGAAGG AATACGAAGA AGAAGAC6AC 

1630 1640 1650 1660 1670 1680 

AA6GCGTCTA GTTTAT6T6A AAATAAAATT ATATTGTCGC AAATTAACTG TGAATaiTT 

1690 1700 1710 1720 1730 1740 

GAAAATGATT TTAAATATTA CCTCAGCGAT TATAACTACG CGTTTTCAAT TATAGATAAT 

1750 1760 1770 1780 1790 1800 

ACTAaAATG TTCTT6TT6C GTTTGGTTTG TATCGTTAAT AAAAAACAAA TITGAaTTT 

1810 1820 1830 1840 1850 1860 

ATAATTGTTT TATTATTaA TAATTACAAA TAGGATTGAG ACCCTTGCAG TTGCCAGaA 

1870 1880 1890 1900 1910 1920 

AC6GACAGAG CTTGTCGA6G AGAGTrGTTG ATTCATT6TT TGCCTCCCT6 CTGC6GTTTT 

1930 1940 1950 1960 1970 1980 

TCACC6AAGT TCATGCaCT CCA6C6TTTT TGCAGCA6AA AAGCC6CCGA CTTCGGTTT6 

1990 2000 2010 2020 2030 2040 

CGGTCGCGAG TGAAGATCCC TTTCTTGTTA CCGCCAACGC CaATATGCC TTGC6AGGTC 

2050 2060 2070 2080 2090 2100 

GaAAATCGG CGAAATTCCA TACCTGTTCA CCGACGACGG CGCTGACGCG ATCAAAGACG 

2110 2120 2130 2140 2150 2160 

CGGTGATAa TATCCAGCCA T6CACACTGA TACTCTTCAC TCCACAT6TC GGTGTACATT 

2170 2180 2190 2200 2210 2220 

6AGTGCA6CC CG6CTAACGT ATCCACGCCG TATTC6GTGA TGATAATC6G CT6ATGCA6T 

„ 2230 2240 22 50 2260 2270 2280 

TTCTCCTCCC AGGCaCAAG TTLlll I ICC A6TACCTTCT CTGCCGTTTC CAAATCGCCG 

2290 2300 2310 2320 2330 2340 

CTTTG6ACAT ACCATCCGTA ATAAC6GTTC AGGCACAGa aTCAAAGAG ATCGCTGAT6 

^ 2350 2360 2370 2380 2390 2400 

GTATC66TGT GA6C6TCGCA GAACATTACA TTGACGaCG TGATCG6ACG CGTCG66TC6 

2410 2420 2430 2440 2450 2460 

AGHTACGCG TTGCTTCCGC CAGTGGCGCG AAATATTCCC GTGCACCTTG CG6AC6GGTA 

2470 2480 2490 2500 2510 2520 

TCC6GTTCGT TGGaATACT CCAaTCACC ACGCTTGGGT GGTTrTTGTC ACGCGCTATC 

253 0 2540 2550 2560 2570 2580 

A6CTCTTTAA TCGCCTGTAA GTGCGCTT6C TGAGTTTCCC CGTT6ACTGC CTCTTCGCTG 



2590 2600 2610 2620 2630 2640 
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TACAGTTCTT TCGGCTTGTT GCCC6CTTCG AAACCAATGC CTAAA6A6A6 6TTAAAGCCG 

2650 2660 2670 2680 2690 2700 

ACAGCAGCAG TTTCATCAAT CACCACGAT6 CCATGTTCAT CTGCCCA6TC 6A6CATCTCT 

2710 2720 2730 2740 2750 2760 

TCAGCGTAAG G6TAATGCGA GGTAC6GTAG GA6TTGGCCC CAATCCAGTC aTTAATGCG 



2770 2780 2790 2800 2810 2820 

T66TC6T6CA CCATCA6CAC CTTATCGAAT CCTTTGCaC GCAAGTCCGC ATCTTaTGA 

2830 2840 2850 2860 2870 2880 

CGACCAAAGC CAGTAAAGTA GAACGGTTTG TGGTTAATCA 6GAACT6TTC GCCCTTaCT 

2890 2900 2910 2920 2930 2940 

GCCACTGACC GGATGCCGAC GCGAAGCGGG TAGATATCAC ACTCTGTCTG GCmTGGCT 

2950 2960 2970 2980 2990 3000 

TGACGCACA GTTCATAGAG ATAACCTTCA CCCG6TTGCC AGAGGTGCGG ATTaCCACT 

3010 3020 3030 3040 3050 3060 

TGCAAAGTCC CGCTA6TGCC TTGTCCAGTT GaACCACCT 6TTGATCC6C ATCACGCAGT 

3070 3080 3090 3100 3110 3120 

TCAAC6CTGA CATCACCATT GGCCACCACC TGCCAGTCAA CAGACGCGTG GTTAaGTCT 

3130 314^ 3150 3160 3170 3180 

TGC6C6ACAT GCGTCACCAC 6GTGATATCG TCCACCGIGG TGTTCGGCGT GGTGTAGAGC 

3190 3200 3210 3220 3230 324 0 

ATTACGCT6C GATGGATTGC GGaTAGTTA AAGAAATCAT GCAAGTAA6A CT6CTTTTTC 

3250 3260 3270 3280 3290 3300 

TT6CCGT7TT CGTCGGTAAT CACaTTCCC GGCG6GATAG TCTGCCAGTT aGTTCGTTG 

3310 3320 3330 3340 3350 3360 

TCACACAAA CGGT6ATACC CCTCGACGGA TTAAAGACTT CAA6CGGTCA ACTATGAAGA 

3370 3380 3390 3400 3410 3420 

A6T6TTC6TC TTCGTCCCA6 TAAGCTATGT CTCCAGAAT6 TAGCCATCCA TCCTTGTaA 

3430 3440 3450 3460 3470 3480 

TCAAGGCGTT 6GTCGCTTCC GGATTGTTTA CATAACCGGA aiAATCATA GGTCCTCTGA 

3490 3500 3510 3520 3530 3540 

CACATAATTC 6CCTCTCTGA TTAACGCCCA GCGTITTCCC GGTATCCAGA TCCACAACCT 

3550 3560 3570 3580 3590 3600 

TCGCTTCAftA AAATGGAAa ACTTTACCGA CC6C6CCCG6 TTTATCATCC CCCTC66GT6 

3610 3620 3630 3640 3650 3660 

TAATCAGAAT A6CT6ATGTA GTCTCA6T6A GCCCATATCC TTGTCGTATC CCTGGAA6AT 

3670 3680 3690 3700 3710 3720 

GGAAGC6TTT T6CAACCCCT TCCCCGACTT CTTTCGAAA6 AG6TGCGCCC CCAGAA6CAA 

3730 3740 3750 3760 3770 3780 

TTTCGTGTAA ATTAGATAAA TCGTATTTGT CAATCAGA6T GCTTTTGGCG AAGAATGAAA 

3790 3800 3810 3820 3830 3840 

ATA6G6TT66 TACTAGCAAC 6CACTTTGAA mrGTAATC CTGAAGGGAT CGTAAAAACA 

3850 3860 3870 3880 3890 3900 

6CTCTTCTTC AAATCTATAC ATTAAGACGA CTCGAAATCC ACATATCAAA TATCCGAGTG 
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3910 



39Ze 



3930 



3940 



3950 



3960 



TAGTAAAar TCCAAAACC6 TGATGGAATG GAACAACACT TAAAATCCCA GTATCC6GAA 

3970 3980 3990 4000 4010 40Z0 

TGATTTGATT GCCAAAAATA 6GATCTCTG6 CAT6CGAGAA TCTGACGaG 6CAGTTCTAT 

4030 4040 4050 4060 4070 4080 

GCGGAAGGGC CACACCCTTA G6TAACCCAG TAGATCCAGA 6GAATTGTTT TGiaCGATC 

4090 4100 4110 4120 4130 4140 

AAAGGACTCT GGTACAAAAT CGTATTCATT AAAACC6GGA GGTAGATGAG ATGTGACGAA 

4150 4160 4170 4180 4190 4200 

CGTGTACATC GACTGAAATC CCTGGTAATC CGTTTTAGAA TCCATGATAA TAATTTTCT6 

4210 4220 4230 4240 4250 4260 

GATTATTGGT AATTTTTTTT CaCGTTCAA AATTTTTTGC AACCCCTTTT TGGAAACAAA 

4270 4280 4290 4300 4310 4320 

.CTACGGTA G6CTGCGAAA TGTTaTACT GTTGAGCAAT TCACGTTCAT TATAAATGTC 

4330 4340 4350 4360 4370 4380 

•GTTCGCGGGC GCAACTGCAA CTCCGATAAA TAACGCGCCC AAaCCGGCA TAAAGAATTG 

4390 4400 4410 4420 4430 4440 

AAGAGAGTTT TCACTGCATA CGACGATTCT GTGATTTGTA TTCAGCCCAT ATCGTTTCAT 

# 

4450 4460 4470 4480 4490 4500 

AGCTTCTGCC AACCGAACGG ACATTTCGAA GTATTCCGCG TACAGCCCGG CC6TTTAAAC 

4510 4520 4530 4540 4550 4560 

GGCCGGGCTT CAATACCCTG ATTGACTGGA ACAGCTGTAG CCCTGAACAG CA6CGTGCGC 

4570 4580 4590 4600 4610 4620 

TGCTGACGCG TCC6GCGATT TCC6CCTCTG ACAGTATTAC CCG6ACGGTC AGCGATATTC 

4630 4640 4650 4660 4670 4680 

^GATAATGT AAAAAC6CGC 6GTGACGATG CCCTGCGTGA ATACAGCGCT AAATTTGATA 

4690 4700 4710 4720 4730 4740 

AAAaGAAGT GACA6C6CTA CGC6TCACCC CTGAA6A6AT CGCCGCCGCC GGCGCGCGTC 

4750 4760 4770 ATZO 4790 4800 

T6A6CGACGA ATTAAAACAG GCGATGACCG CTGCCGTCAA AAATATTGAA ACGTTCCATT 

4810 4820 4830 4840 4850 4860 

CCGC6CA6AC GCTACC6CCT GTAGATGTG6 AAACCCA6CC AGGCGTGC6T TGCCAGaGG 

4870 4880 4890 4900 4910 4920 

TTAC6CGTCC CGTCTC6TCT 6TC6GTCT6T ATA7TCCCGG CGGCTCGGCT CC6CTCTTCT 

4930 4940 4950 4960 4970 4980 

CAACG6T6CT 6ATGCTGGCG ACGCCGGCGC GCATTGCGGG ATGCaCAAG GTGGTTCTGT 

4990 5000 5010 5020 5030 5040 

GCTCGCCGCC GCCCATC6CT GATGAAATCC TCTATGC6GC GCAACTGTGT GGCGTGCAGG 

5050 5060 5070 5080 5090 5100 

AAATCTTTAA C6TC6GCGGC GCGaGGCGA TTGCCGCTCT GGCCTTCGGC AGCGAGTCCG 

5U0 5120 5130 5140 5150 5160 

TACCGAAAGT GGATAAAATT TTTGGCCCCG GCAACGCCTT TGTAACCGAA 6CCAAAC6TC 



5170 5180 5190 5200 5210 5220 

A66TCA6CCA GCGTCTCGAC GGCGCGGCTA TCGATATGCC AGCC66GCCG TCTGAAGTAC 
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5230 5240 5250 5260 5270 - 5280 

TCGTGATCGC A6ACAGCCGC GCAACACCGG ATTTCGTC6C TTCTGACCTG CTCTCCaCG 

5290 5300 5310 5320 5330 5340 

CTGAGaCGG CCCGGATTCC CAG6TGATCC T6CTGACGCC TGAT6CTGAC ATTGCCCGa 

5350 5360 5370 5380 5390 5400 

A6GTGGC6GA GGC66TACAA CGTCAACTGG CG6AACTGCC GCGCGCGGAC ACCGCCC6GC 

5410 5420 5430 5440 5450 5460 

AG6CCCTGAG CGCCA6TC6T CTGATT6T6A CCAAAGATTT AGCGaCTGC GTC6CCATCT 

5470 5480 5490 5500 5510 5520 

CTAATCA6TA TG6GCCGGAA CACTTAATa TCCAGAC6C6 CAAT6C6CGC GATTT66TGG 

5530 5540 5550 5560 5570 5580 

AT6CGATTAC CAGC6CA6GC TC66TATTTC TC66C6ACTG GTCGCCGGAA TCC6CCG6TG 

5590 5600 5610 5620 5630 5640 

ATTACGCTTC CG6AACCAAC CAT6TTTTAC CGACCTAT6G CTATACT6CT ACCTGTTCa 

5650 5660 5670 5680 5690 5700 

GCCTT666TT AGCG6ATTTC aGAAACGGA T6ACCGTTCA GGAACTGTCG AAAGCG66CT 

5710 5720 5730 5740 5750 5760 

TTTCCGCTCT GGCATCAACC ATT6AAACAT TG6CGGC66C AGAAC6TCTG ACCGCCCATA 

5770 5780 5790 5800 5810 5820 

AAAATGCCGT GACCCTGCGC GTAAACGCCC TCAAGGA6CA AGCATGAGU CTGAAAACAC 

5830 5840 5850 5860 5870 5880 

TCTCAGCGTC 6CTGACTTAG CCCGTGAAAA TGTCCGCAAC CT6GAGATCC AGACATGGAT 

5890 5900 5910 5920 5930 5940 

*A6ATACATT GAT6A6TTTG GACAAACCAC AACTAGAATG CAGTGAAAAA AATGCTTTAT 

5950 5960 5970 5980 5990 6000 

TT6TGAAATT T6T6ATGCTA TT6CTTTATT TGTAACCATT ATAA6CTGCA ATAAACAAGT 

6010 6020 6030 6040 6050 6060 

TAACAACAAC AATTGCATTC ATTTTATGTT TCAGGTTCAG G6GGAGCTGT GGGAGGTTTT 

6070 6080 6090 6100 6U0 6120 

TTAAA6CAAG TAAAACCTCT AaAATGTGG TAT66CTGAT TATGATCTCT AG6GCCGGCC 

6130 6140 6150 6160 6170 6180 

CTCGAC6GCG CGTCTA6AGC AGT6TGGTTT TaAGAGGAA GUAAAAGCC TCTCCACCCA 

^200 6210 6220 6230 6240 

66CCTG6AAT GTTTCCACCC AAT6TC6A6C A6T6T6GTTT T6CAA6AGGA AGCAAAAA6C 

625» 6260 6270 6280 6290 6300 

CTCTCCACCC AG6CCTG6AA T6TTTCCACC CAAT6TCGAG CAAACCCCGC CCAGCCTOT 

^,.631® 6320 6330 6340 6350 6360 

6TCATTG6CG AATT66AACA C6CATATGCA GTC6666CG6 C6C66TCCCA GGTCaCTTC 

6370 6380 6390 6400 6410 6420 

GCATATTAAG GTGGC6CGTG TGGCCTCGAA CACCGAGCGA CCCTGCAGCC AATATGGGAT 

6430 6440 6450 6460 6470 6480 

C6GCCATTGA ACAAGATGGA TT6CAC6CAG 6TTCTCC6GC CGCTT6G6TC GAGAGGCTAT 



6490 6500 6510 6520 6530 6540 
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TCG6CTAT6A CTGGGCAaA CA6ACAATCG GCTGCTCTGA T6CCGCCGTG TTCCG6CTGT 

6550 6560 6570 6580 6590 6600 

CAGCCaCGG GCGCCC6GTT CTTTTTGTCA AGACCGACCT 6TCCGGT6CC CTGAAT6AAC 

6610 66Z0 6630 6640 6650 6660 

TGCAG6TAAG TGCGGCCGTC GATGGCCGA6 GCGGCCTCGG CCTCTGCATA AATAAAAAAA 

6670 6680 6690 6700 6710 6720 

ATTAGTCA6C CATGCATGGG GCG6AGAATG GGCGGAACTG GGCGGAGTTA GGGGCGGGAT 

6730 6740 6750 6760 6770 6780 

G66CG6AGTT AG6GGCG66A CTATG6TT6C TGACTAATTG AGATGCATGC TTTGaTACT 

6790 6800 6810 6820 6830 6840 

TCT6CCT6CT G666A6CCTG GG6ACTTTCC ACACCT66TT GCTGACTAAT TGAGATGaT 

6850 6860 6870 6880 6890 6900 

'^CTTTGCATA CTTCTGCCTG CTGGG6A6CC IGGGGACTTT CaaCCCTA ACTGAaaC 

6910 6920 6930 6940 6950 6960 

ATTCCACAGA ATTAATTCCC CTAGTrATTA ATAGTAATCA ATTACGG6GT aTTAGTTCA 

■ 

6970 6980 6990 7000 7010 7020 

TAGCCCATAT AT6GA6TTCC GCGTTACATA ACTTACGGTA AATGGCCCGC CT6GCTGACC 

7030 7040 7050 7060 7070 7080 

GCCCAACGAC CCCC6CCCAT TGACGTCAAT AATGAC6TAT GTTCCCATAG TA/tCGCCAAT 

7090 7100 7110 7120 7130 7140 

ACGGACtTTC CATTGAC6TC AATGG6T6GA GTATTTAC6G TAAACT6CCC ACTTGGaGT 

7150 7160 7170 7180 7190 7200 

ACATCAAGTG TATCATATGC CAAGTACGCC CCCTATTGAC GTCAATGACG GTAAATGGCC 

7210 7220 7230 7240 7250 7260 

'''5CCTG6CAT TATGCCCAGT ACATGACCTT ATGGGACTTT CCTACTTGGC AGTACATCTA 

7270 7280 7290 7300 7310 7320 

C6TATTA6TC ATCGCTATTA CCAT6GT6AT GCGGTnTGG CA6TACATCA ATGGGCGT6G 

7330 7340 7350 7360 7370 7380 

ATA6CG6TTT 6ACTCAC66G GATTTCCAA6 TCTCCACCCC ATTGAC6TCA ATGGGAGTTT 

7390 7400 7410 7420 7430 7440 

GTTTTGGCAC CAAAATCAAC GGGACTTTCC AAAATGTCGT AACAACTCC6 CCCaiTGAC 

7450 7460 7470 7480 7490 7500 

GCAAATGGGC GGTAGGCGTG TACGGTGGGA GGTCTATATA AGCAGAGCTG GGTACGTGAA 

7510 7520 7530 7540 7550 7560 

CCGTCAGATC GCCT66AGAC 6CCATCACAG ATCTCTCACC ATGGACAT6A 66GTCCCC6C 

7570 7580 7590 7600 7610 7620 

TCA6CTCCTG G6GCTCCTTC TGCTCTGGCT CCCA66T6CC AGATGT6ACA TCCAGAT6AC 

7630 7640 7650 7660 7670 7680 

CaGTCTCa TCTTCCCTCT CTGaTCTGT AGGGGAMGA GTCACCATa CTTGaGGGC 

7690 7700 7710 7720 7730 7740 

AAGTaGGAC ATTAGGTATT ATTTAAATTG GTATCAGCAG AAACCAGGAA AAGCTCCTAA 

7750 7760 7770 7780 7790 7800 

GCTCCTGATC TATGTTGCAT CCAGTTTGa AAGTGGGGTC CCATCAAGGT TCAGCGGaG 
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7810 7820 7830 7840 7850 7860 

TG6ATCTGG6 ACAGAiTTTCA CTCTCACCGT CAGCAGCCTG CAGCCTGAAG ATnTGCGAC 

7870 7880 7890 7900 7910 7920 

TTATTACTGT CTAaGGTTT ATAGTACCCC TC6GACGTTC GGCUAGGGA CCAA6GT66A 

7930 7940 7950 7960 7970 7980 

AATCAAACGT ACGGT6GCTG CACCATCT6T CTTCATCTTC CCGCaTCTG ATGAGCAGTT 

7990 8000 8010 80Z0 8030 8040 

GAAATCTG6A ACTGCCTCTG TTGTGT6CCT 6CTGAATAAC TTCTATCCCA GA6A6CCCAA 

8050 8060 8070 8080 8090 8100 

AGTACAGTGG AAGGT6GATA AC6CCCTCCA ATCGG6TAAC TCCCA66AGA GTGTaCAGA 

8110 8120 8130 8140 8150 8160 

GCAG6ACAGC AAGGACAGa CCTACAGCCT CAGCAGCACC CTGAC6CTGA 6CAAAGCAGA 

8170 8180 8190 8ZO0 8210 8220 

^-ACGAGAAA CACAAA6TCT ACGCCT6C6A AGTaCCCAT CA666CCT6A GCTCCCCC6T , 

8230 8240 8250 8260 8270 8280 

CACAAAGAGC TTCAAaGGG 6AGAGTGTTG AATTCAGATC CGTTAACG6T TACCAACTAC 

8290 8300 8310 83Z0 8330 8340 

CTAGACTGGA TTCGTGACAA CATGCG6CCG TGATATCTAC GTATGATCAG CCTCGACTGT 

8350 8360 8370 8380 8390 8400 

GCCTTCTAGT TGCCAGCCAT CT6TTGTTTG CCCCTCCCCC GTGCCTTCCT TGACCCTCGA 

8410 8420 8430 8440 8450 8460 

A66TGCCACT CCCACTGTCC TTTCCTAATA AAATGACGAA ATTGaTCGC ATTGTCTGA6 

8470 8480 8490 8500 8510 8520 

TAGGTCTCAT TCTATTCTG6 C6GGTGG6GT GG6GCAGGAC A6CAAG6GGG AGGATTGGGA 

8530 8540 8550 8560 8570 8580 

ACAATAGC AGGaTGCTG GGGATGCGGT GGGCTiCTATG GCTTCT6AGG CG6AAAGAAC 

8590 8600 8610 8620 8630 8640 

CAGCTGGGAC TAGTCGCAAT TGG6CGGAGT TAGGGGCGGG AT6GGCG6AG TTAGG6GCG6 

8650 8660 8670 8680 8690 8700 

(6ACTATGGTT CCTGACTAAT TGAGATGaT GCTTTGaTA CTTCTGCCTG CTGGG6AGCC 

8710 8720 8730 8740 8750 8760 

TGGGGACTTT CCACACCTGG TT6CT6ACTA ATT6A6AT6C ATGCTTTGCA TACTTCT6CC 

8770 8780 8790 8800 8810 8820 

TfiCTGGGGAG CCTGGG6ACT TTCaaCCC TAACTCACAC ACATTCCACA GAATTAATTC 

8830 8840 8850 8860 8870 8880 

CCCTA6TTAT TAATA6TAAT CAATTAC6GG GTaTTAGTT CATA6CCCAT ATAT6GA6TT 

8890 8900 8910 8920 8930 8940 

CCGCGTTAa TAACTTACGG TAAATG6CCC GCCTGGCTGA CCGCCCAACG ACCCCCGCCC 

8950 8960 8970 8980 8990 9000 

ATTGACGTCA ATAATGACGT ATGTTCCaX AGTAACGCa ATAGGGACTT TCaTTGACG 

9010 9020 9030 9040 9050 9060 

TaATGGGTG GAGTATTTAC 6GTAAACTGC CCACTTGGa GTACATCAAG TGTATCATAT 



9070 9080 9090 9100 9110 9120 

GCCAA6TAC6 CCCCCTATTG AC6TCAATGA CG6TAAATG6 CCCGCCTGGC ATTAT6CCU 
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9130 9140 9150 9160 9170 .9180 

GTACAT6ACC TTATC6CACT TTCCTACTTG CCACTACATC TAC6TATTAC TaTCCCTCT 

9190 9Z00 9Z10 9ZZ0 9230 9Z40 

TACCATG6T6 ATGCGGTTTT 6GCACTACAT CAATC66C6T GGATA6CG6T TTGACTCACG 

9250 9260 9270 9280 92 90 9300 

GG6ATTTCCA A6TCTCCACC CCATTGACGT CAAT6G6A6T TTCTTTrGGC ACCAAAATa 

9310 9320 9330 9340 9350 9360 

ACGGGACTTT CCAAAATGTC GTAACAACTC CGCCCCATTG AC6CAAAT6G 6CGGTA6GC6 

9370 9380 9390 9400 9410 9420 

TGTAC66T66 GAGGTCTATA TAAGCAGAGC TGGGTAC6TG AACCGTCACA TCGCCTGaC 

9430 9440 9450 9460 9470 9480 

AC6CC6TCGA CAT666TT6G AGCCTaTCT T6CTCTTCCT T6TC6CT6TT 6CTAC6C6T6 

9490 9500 9510 9520 9530 9540 

• CCTGTCCGA GGTGaCCTG GTG6A6TCTG GGG6CG6CTT 6GCAAAGCCT GGGG6GTCCC 

9550 9560 9570 9580 9590 9600 

TGA6ACTCTC CTGCGaGCC TCCGGGTTa GGTTCACdT CAATAACTAC TACATG6ACT 

9610 9620 9630 9640 9650 9660 

6GGTCCGCCA GGCTCCAGGG CA6GGGCTG6 A6TGG6TCTC ACCTATTA6T A6TAGTG6TG 

9670 9680 9690 9700 9710 9720 

ATCCaCATG 6TACGCAGAC TCC6TGAAGG GCA6ATTCAC CATCTCaGA 6AGAACGCCA 

9730 9740 9750 9760 9770 9780 

AGAACACACT GTTTCTTCAA ATGAAaGCC T6AGA6CTGA 66ACAC6GCT GTCTftTTACT 

9790 9800 9810 9820 9830 9840 

GTGCGAGCTT 6ACTACAGGG TCTGACTCCT 66GGCCAGGG A6TCCTG6TC ACCGTCTCCT 

9850 9860 9870 9880 9890 9900 

aCCTAGCAC CAA6GGCCCA TC66TCTTCC CCCTGGCACC CTCCTCCAAG A6CACCTCTC 

9910 9920 9930 9940 9950 9960 

GGGGCAUGC GGCCCTGGGC TGCCTGGTa AG6ACTACTT CCCC6AACC6 GT6ACGGT6T 

9970 9980 9990 10000 10010 10020 

CGT66AACTC AGGCGCCCTG ACCAGCGGCG TGaaCCTT CCC66CTGTC CTACAGTCCT 

10030 10040 10050 10060 10070 10080 

aCGACTCTA CTCCCTCAGC A6C6TGGTGA CCGT6CCCTC CAGCAGCITG GGaCCCAGA 

10090 10100 10110 10120 10130 10140 

CCTAarCTG CAAC6T6AAT CACAA6CCCA GCAAaCCAA GGTGGACAAG AAAGTTGAGC 

10150 10160 10170 10180 10190 10200 

CaAATCTTG TGACAAAACT CACAaTGCC CACC6T6CCC AGaCCTGAA CTCCTGGGGG 

10210 10220 10230 10240 10250 10260 

GACCGTCA6T CTrCCTCTTC CCCCaAAAC CCAAGGAaC CCTCAT6ATC TCCC66ACCC 

10270 10280 10290 10300 10310 10320 

CTGAGGTCAC AT6C6T6GT6 GTGGACGT6A GCCACGAAGA CCCT6A6GTC AAGTTaACT 

10330 10340 10350 10360 10370 10380 

GGTACGT6GA CCGC6TGGAG GTCaiAATG CCAAGACAAA 6CC6C6G6AG GAGCAGTAa 



10390 10400 10410 10420 10430 10440 
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ACA6CACGTA CCGTCT66TC A6C6TCCTCA CCCTCCTGa CCAC6ACTCG CTGAATGCa 

10450 10460 10470 10480 10490 10500 

AC6A6TACAA CTGCAAGCTC TCCAACAAAG CCCTCCCAGC CCCaTCGAG AAAACaTCT 

1®S30 10540 10550 10560 

CaAAGCCAA A6GGCAGCCC CGA6AACCAC AGGTGTAaC CCTGCCCCCA TCCCG6GAT6 

10570 10580 10590 10600 10610 10620 

A6CTGACCAA GAACaGGTC AGCCT6ACCT GCCT6GTCAA AGGOTCTAT CCCAGCGAa 

~^^,1S5??^ 1«6«0 18670 10680 

TC6CCGTGGA GTGGGAGA6C AATGG6CAGC CGGAGAACAA CTACAAGACC AC6CCTCCCG 

1®^* 18710 10720 1073O 10740 

TGCT6GACTC CCACGGCTCC TTCTTCCTCT ACAGCAAGCT CACC6T66AC AAGACaGCT 

13750 10760 10770 10780 10790 10800 

'^GCAGCAGGG GAACGTCTTC TaTGCTCCG T6AT6CAT6A GGCTCTGaC AACaCTAa 

,,^.,1**1« 1B820 10830 10840 10850 10860 

C6CAGAAGA6 CCTCTCCCT6 TCTCCG6GTA AAT6A66ATC CGTTAACG6T TACCAACTAC 

1®"8 1W98 1B9«« 18918 18920 

CTAGACT6GA TTCGTGACAA CATGC6GCC6 TGATATCTAC GTATGATCAG CCTCGACT6T 

^^^^i2?^® 188*8 10950 10960 10970 10980 

6CC1TCTAGT TGCCAGCCAT CTGTTGTTTG CCCCTCCCCC 6T6CCTTCCT TGACCCTGGA 

11888 UOlO 11020 11030 U040 

A6GTGCCACT CCCACTGTCC TTTCCTAATA AAATGAGGAA ATTGaTCGC ATTGTCTGAG 

11888 UO70 11080 11090 UlOO 

TAGGT6TCAT TCTATTCTG6 GGGGTGGG6T GGGGCAGGAC AGCAAGGGGG AGGATTGGGA 

11110 11120 11130 11140 11150 11160 

'"^ACAATAGC A6GCATGCTG 6GGATGCGGT GG6CTCTATG 6CTTCTCAGG C6GAAAGAAC 

11170 11180 11190 11200 11210 11220 

aCCTGGGGC TCGAaGCAA CGCTAGGTCG AGGCCGCTAC TAACTCTCTC CTCCCTCCTT 

11230 11240 11250 11260 11270 11280 

TTTCCTGaG GACGA6GCAG C6C6GCTATC GT66CT6GCC ACGAC66GCC TTCCTTGCCC 

11290 11300 11310 11320 11330 11340 

A6CTGTGCTC GACGTTGTCA CTGAA6C6GG AA6GGACTG6 CTGCTATTGG GCGAA6T6CC 

11350 11360 11370 11380 11390 11400 

666GCAGGAT CTCCTGTaX CTCACCTT6C TCCTGCCGA6 AAAGTATCCA laiGGCTGA 

-n-r^Jitl^ 11*38 U440 11450 11460 

TCCAAT6C66 CGGCTGaTA C6CTF6ATCC G6CTICCT6C CCATTCGACC ACCU6CGAA 

11470 11480 11490 11500 IISIO 11S20 

AarCGCATC GAGCGACaC GTACTC6GAT G6AA6CC66T nTCTCGATC A6GATGATCT 

11S30 11540 11S50 11560 11S70 11580 

GGACGAAGA6 aTCAGGGGC TCGCGCCAGC CGAACTGTTC GCaGGTAAG TGA6CTCCAA 

TTi...JS?S J^^^ 11818 11620 11630 U640 

TTCAA6CTCT C6AGCTAGGG CGGCaGCTA GTAGCT7TGC TTCTCAATIT CTTATTTGa 

•r..<r^Hff? . 11888 116 70 11680 11690 11700 

TAATGA6AAA AAAAGGAAAA TTAATTTTAA CACCAATTCA GTAGTTGATT 6A6CAAATGC 
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U710 U7Z0 U730 U740 11750 U76e 

GTTCCCAAAA AGGATGCTTT AGACACA6T6 TTCTCTSaC ACATAAGGAC AAACATTATT 

U77e U78e U799 USSO 11810 11820 

CA6AG66AGT ACCaGAGCT GAGACTCCTA AGCCAGTGAG TGGCACAGCA TCaCGGAGA 

11830 U84e U850 U860 11870 11880 

AATATGCTT6 TaiacCGA AGCCTGATTC CGTA6AGCCA UCCCT6GTA AGGGCCAATC 

U890 U900 U910 11920 11930 11940 

T6CTCACACA GGATAGAGAG GGaCGAGCC AGGCaCAGC ATATAAG6T6 AGGTUGCATC 

11950 U969 U970 11980 11990 UOOO 

A6TTGCTCCT CACATTTGCT TCTGACATAG TTGTGTT6GG AGCTTG6ATA GCTTGG666G 

12010 12020 12030 12040 12050 12060 

G66ACAGCTC AGGGCTGCGA TTTCGCGCa AACTTGACGG CAATCCTAGC GT6AA6GCTG 

1 2070 12080 12090 12100 12110 12120 

•A66ATTTT ATCCCCGCTG CaxaTCGT TCGACCATTG AACTCaTCG TC6CCGT6TC 

12130 12140 12150 12160 12170 12180 

.CCAAAATAT6 GGGATTGCa AGAACGGAGA CCTACCCT66 CCTCCGCTa GGAACGAGTT 

12190 12200 12210 12220 12230 12240 
CAAGTACTTC CAAAGAAT6A CCACAACCTC TTCAGTGGAA GGTAAACAGA ATCT66T6AT 

12250 12260 12270 12280 12290 12300 

TATGG6TAGG AAAACCTGGT TCTCCATTCC TGA6AA6AAT C6ACCTTTAA AGGAaGAAT 

12310 12320 12330 12340 12350 12360 

TAATATA6TT CTCAGTAGAG AACTCAAAGA ACaCCACGA 6GAGCTCATT TTCTTCCaA 

12 370 12380 12390 12400 12410 12420 

AA6TTT6GAT GAT6CCTTAA CGTA66CGC6 CCATTAAGAC TTATTGAACA ACCGGAATT6 

12430 12440 12450 12460 12470 12480 

:AA6TAAA6 TAGAaTGGT TT6GATA6TC GGAGGaCTT CT6TTTACCA 6GAAGCCAT6 

12490 12500 12510 12520 12530 12540 

AATCAACCA6 GCCACCTCAG ACTCTTTGTG ACAA66ATCA TGCAGGAATT TGAAAGTGAC 

1255 0 12560 12570 12580 12590 12600 

AC6TTTTTCC CAGAAATT6A TTTGCGGAAA TATAAACTTC TCCCAGAATA CCCAG6CGTC 

12610 12620 12630 12640 12650 12660 

CTCTCT6A6G TCCA66A66A AAAAGGaTC AA6TATAAGT TT6AA6TCTA CGAGAA6AAA 

12670 U680 12690 UTOO 12710 12720 

fiACTAAaCC AA6AT6CTTT CAA6TTCTCT 6CTCCCCTCC TAAA6CTATG aTTTTTATA 

12730 12 740 12750 12760 12770 U7S0 
A6ACCAT666 ACmTCCTS GCTmGATC A6CCTCGACT 6TGCCTTCTA GTTGCaGCC 

12790 12800 12810 12820 12830 12840 

ATCTGTTGTT TGCCCCTCCC CC6T6CCTTC CTTGACCCTG GAAGGTGCCA CTCCaCTGT 

^12850 12860 12870 12880 12890 12900 

CCTTTCCTAA TAAAATGAGG AAATTGCATC GCATTGTCTG AGTAGGTGTC ATTCTATTCT 

12910 12920 12930 12940 12950 12960 

G66666TG6G GTGGGGCAGG ACAGCAA66G GGAGGATTGG GAAGAaATA 6CAGGCATGC 

12970 U980 12990 13000 13010 13020 

T666GAT6C6 6T666CTCTA TGGCTTCTGA G6CG6AAAGA ACCAGCT6GG GCTCGAA6C6 
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13030 13040 13050 13060 13070 13080 

6CCGCCCATT TC6CT66T66 TaCATGCGG CATC6CGTGG GAC6C6GCG6 6GA6CGTCAC 

13090 13100 13110 13120 13130 13140 

ACTGAGGnr TCC6CCAGAC GCCACT6CT6 CCAGGC6CTG ATGT6CCCGG CTTCTGACCA 

13150 13160 13170 13180 13190 13200 

TGC6GTCGC6 TTCGGTTGCA CTAC6CGTAC TGT6ACCCA6 AGTT6CCCG6 CGCTCTCCGG 

13210 13220 13230 13240 13250 13260 

CT6CGGTAGT TaGGaGTT CAATCAACTG TTTACCTTGT GGAGCGACAT CaGAGGaC 

13270 13280 13290 13300 13310 13320 

TTCACCGCTT GCCAGCGGCT TACaTCCAG CGCaCCATC aCTGCAGGA GCTC6TTATC 

13330 13340 13350 13360 13370 13380 

GCTATGACGG AACAGGTATT CGCTGGTCAC TTC6ATG6TT TGCCCGGATA AACGGAACTG 

13390 13400 13410 13420 13430 13440 

wJUAACTGC TGCTGGT6TT TTGCTTCCGT CAGCGCTGGA TGCGGCGTGC GGTCGGCAAA 

13450 13460 13470 13480 13490 13500 

GACCA6ACCG TTCATACA6A ACTG6CGATC 6TTCGGC6TA TCGCCAAAAT CACCGCCGTA 

13510 13520 13530 13540 13558 13560 

AGCCGACCAC GGGTTGCCGT TTTaTaTA TiTAATCAGC GACTGATCCA CCCAGTCCa 

13570 13580 U590 13600 13610 13620 

GAC6AAGCCG CCCTGTAAAC GG6GATACT6 ACGAAACGCC TGCCAGTATT TAGCGAAACC 

13630 13640 13650 13660 13670 13680 

6CCAA6ACTG TTACCCATCG CGT666C6TA TTCGCAAAGG ATaGCGGGC GCGTCTCTCC 

13690 13700 13710 13720 13730 13740 

A66TAGCGAA AGCCA I HI! TGATGGACCA TTTCGGCACA 6CCCGGAAGG GCTGGTCTTC 

13750 13760 13770 13780 13790 13800 

aTCCACGCGC GCGTACATCG GGCAAATAAT ATCGGTGGCC GTGGTGTCGG CTCCGCCGCC 

13810 13820 13830 13840 13850 13860 

TTCATACTGC ACCGGGCGGG AAGGATCGAC AGATTTGATC CAGCGATACA GCGCGTC6TG 

13870 13880 13890 13900 13910 13920 

ATTAGCGCC6 TG6CCTGATT aiTCCCCAG CGACCAGATG ATaCACTCG GGTGATTAC6 

13930 13940 13950 13960 13970 13980 

ATC6CGCTGC ACCATTCGCG TTACGC6TTC GCTCATCGCC G6TA6CCAGC GCGGATCATC 

13990 14000 14010 14020 14030 14040 

66TCAGACGA TTaTTGGCA CaiGCCGTG GGTTtCAATA TTGGCTTCAT CacaaTA 

14050 14060 14070 14080 14090 14100 

CA6GCCGTA6 CGGTCGCACA GC6T6TACCA CAGC6GAT6G TTCGGATAAT GCGAAaCCG 

14110 14120 14130 14140 14150 14160 

aCGGCGTTA AA6TTGTTCT GOTCATaG aCGATATCC TGacaTCG TCTGCTCATC 

14170 14180 14190 14200 14210 14220 

CATGACCTGA CCATGCAGAG GATGATGCTC GT6AC6GTTA ACGCCTCGAA TaGCAACGG 

14230 14240 14250 14260 14270 14280 

CTTGCCGTTC AGCAGCAGCA GACaTTTTC AATCCGCACC TC6CGGAAAC CGACATCGCA 

14290 14300 14310 14320 14330 14340 
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GGCTTCTGCT TCAATCAGC6 T6CCGTCGGC 6GTCTGCACT TCAACCACCG CAC6ATAGAG 

U3S0 14360 14370 14380 14390 14400 

ATTCGGGATT TCGGCGCTCC AaGTTTCGG GTTITCGACG TTaGACGTA GTGTGACGC6 

14410 144Z0 14430 14440 14450 14460 

ATCGGCATAA CCACCACGCT CATC6ATAAT TTaCCGCCG AAAG6C6CGG TGCC6CTGGC 

14470 14480 14490 14500 14510 14520 

GACCTGC6TT TaCCCTGCC ATAAAGAAAC TGTTACCC6T AGGTAGTaC 6CAACTC6CC 

14530 14540 14550 14560 14570 14580 

GaCATCTGA ACTTaCCCT CaGTACAGC GCGGCTGAAA TCATaTTAA AGC6AGTGGC 



14590 14600 14610 14620 14630 14640 

AACATGGAAA TCGCTGATTT GTGTAGTCGG TTTATCaGC AACGA6ACGT aCGGAAAAT 

14650 14660 14670 14680 14690 14700 

''''C6CTCATC CGCCAaTAT CCTGATCTTC aGATAACTG CCGTCACTCC AGCGaCCAC 

14710 14720 14730 14740 14750 14760 

arCACCGCG AGGCGGTTTT CTCC6GCGC6 TAAAAATGCG CTaGGTCAA ATTCAGACGG 

14770 14780 14790 14800 14810 14820 

CAAACGACT6 TCCTGGCC6T AACCGACCa GCGCCCGTTG CACCACAGAT GAAAC6CC6A 

14830 14840 14850 14860 14870 14880 

GTTAACGCCA TaAAAATAA TTCGCGTCTG GCCTTCCTGT AGCCAGCTIT CATCAAaiT 

14890 14900 14910 14920 14930 14940 

AAATGTGAGC GAGTAACAAC CCGTCGGATT CTCCGTGGGA ACAAACGGCG GATTGACCGT 

14950 14960 14970 14980 14990 15000 

AATGGGATAG GTCACGTTGG TGTAGATG6G C6CATCGTAA CCGTGaTCT GCaGTTTGA 

ISeie 15020 15030 15040 15050 15060 

''GGACGACG AaGTATCGG CCTaGGAAG ATCCaCTCC AGCCAGCTIT CCGGaCCGC 

15070 15080 15090 15100 1SU0 15120 

TrCTGGTGCC 6GAAACCAG6 CAAAGC6CCA TTCGCaTTC AGGCTGCGa ACTGTTGGGA 

15130 15140 15150 1S160 15170 15180 

A6G6C6ATCG GT6C66GCCT CTTCGCTATT ACGCaCCTG GCGAAAGG6G GATGTGCT6C 

15190 15200 15210 15220 15230 15240 

AAGGCGATTA AGTT666TAA CGCaGGCTT TTCCaCTCA CGACGTTGTA AAACGACTTA 

15250 15260 15270 15280 15290 15300 

ATCC6TCGAG GGGCTGCCTC GAAGCAGACG ACCTTCCGTT GTCaGCaG CGGC6CCTGC 

1S310 15320 15330 15340 15350 15360 

GCC66T6CCC ACAATCGTGC 6C6AACAAAC lAAACaOA CAAATTATAC CGGCGCaCC 

15370 15380 15390 15400 15410 15420 

GCCGccAca CCITCTCCC6 TGCCTAACAT TcaGCGCCT cacaccAC acucaTc 

15430 15440 15450 15460 15470 15480 

GAT6TCTGAA TTGCCGCCCG CTCacCAAT GCCGACG6AA CCTCAACCCG CTGCACCTTT 

15490 15500 15510 15520 15530 15540 

A6AC6ACAGA CAACAATTGT TGGAAGCTAT TAGAAAC6AA AAAAATCGCA CTCGTCTCAG 



15550 15560 15570 15580 15590 15600 

ACCG6TCAAA CCAAAAAC66 CGCCCGAAAC CA6TACAATA GTTGAGGTGC CGACTGTGTT 
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15610 ISeZe 15630 15640 15650 15660 

GCCTAAA6AG ACATTTGACC CTAAACCGCC GTCTCaxa CCGCacaC CTCCGCCTCC 

15670 15680 15690 15700 1^710 iS720 

GCCTCC6CC6 CCAGCCCC6C CTCCGCCTCC ACC6ATG6TA GATTOTCAT aGCTCCACC 

15730 15740 15750 15760 15770 15780 

ACC6CC6CCA TTAGTA6ATT TGCC6TCTGA AATGTTACa CCGCCTGCAC aTCGCTTTC 

15790 15800 15810 15820 15830 15840 

TAACGTGTTG TCTGAATTAA AATC666CAC A6TTAGATTG AAACCCGCCC AAAAAC6CCC 

15850 15860 15870 15880 15890 15900 

GCAATCA6AA ATAATTCCAA AAAGCTCAAC TACAAATTTG ATC6C6GAC6 TGTTAGCC6A 

15910 159Z0 15930 15940 15950 15960 

CACAATTAAT AGGCGTC6T6 TG6CTATGGC AAAATCGTCT TC66AA6CAA CTTCTAACCA 

15970 15980 15990 16000 1601^ 1^20 

•A666TT6G GAC6ACGAC6 ATAATC6GCC TAATAAA6CT AAaCGCCCG ATGTTAAATA 

16030 16040 16050 16060 16070 16080 

.T6TCCAAGCT ACTAGTGGTA CC6CTT6GCA GAAaTATCC ATC6C6TCCG CCATCTCCAG 

16090 16100 16110 16120 16130 

CAGCCGCACG CGGCGCATCT CGGGCAGCGT TGGGTCCTG6 caCGGGTGC GCAT6ATCGT 

16150 16160 16170 16180 16190 16200 

GCTCCTGTCG TTGAGGACCC GGCTAGGCT6 GCGGGGTT6C CTTACTGGTT A6CAGAATGA 

16210 16220 16230 16240 16250 16260 

ATCACCGATA CGC6AGCGAA CGTGAAGCGA CTGCTGCTGC AAAAC6TCTG C6ACCTGA6C 

16270 16280 16290 16300 16310 16320 

AACAACATGA AT6GTCTTC6 GTTTCC6TGT TTXGTAAACT CTGGAAAC6C 6GAA6TCA6C 

16330 16340 16350 16360 16370 16380 

:CCT6CACC ATTATCTTCC GGATCT6CAT CGCA6GAT6C TGCTGGCTAC CCTGTGGAAC 

16390 16400 16410 16420 16430 ^^^ 16440 

ACCTACATCT GTATTAAC6A AGC6CTG6CA TtGACCCTGA GTGATmTC TCT6GTCCCG 

16450 16460 16470 16480 16490 legj 

CCGaTCCAT ACCGCCAGTT GTTTACCCTC ACAACGTTCC ACTAACCCGG aTGTTCATC 

16510 16520 16530 16540 16550 16560 

AiaCTAACC CGTATC6T6A GCATCCTCTC TCCTTTCATC GClATaTTA CCCCaXGAA 

16570 16580 16590 16600 16610 If SJ 

aCAAATCCC CCTTAaCGG AGGaTCAGT GACCAAAaG GAAAAAACCG CCCTTUCAT 

16630 16640 16650 16660 16670 16680 
G6CCCGCTTT ATaGAAGCC AGAaTTAAC 6CTTCT66AG AAACTCAACG A6CTGGACGC 

16690 16700 16710 16720 16730 IfJfJ 

G6ATGAACAG GCAGAaTCT GTGAATCGCT TCACGACaC 6CTGATGAGC TTTACCGCAG 

16750 16760 16770 16780 16790 " JS 

CT6CCTC6C6 CGTTTCGGTG ATGACGGTGA AAACCTCTGA aaTCaGC TCCC66AGAC 

16810 16820 16830 16840 16850 16860 

GGTCAaGCT T6TCT6TAA6 CGGAT6CCGG GAGCAGACAA GCCCGTaGG GCGCGTUGC 

16870 16880 16890 16900 16910 169Z0 

6GGT6TTG6C G6GT6TCGGG GC6CAGCCAT GACCCA6TCA CGTAGCGATA GCGGAGTGTA 
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16930 16940 16950 16960 16970 16980 

TACTCCCTTA ACTATCCGGC ATaCAGCAG ATT6TACTCA GAGTCCACCA TAT6CG6T6T 

16990 17000 17010 17020 17030 17040 

GAAATACC6C ACAGATGC6T AAGGA6AAAA TACC6CATCA GGC6CTCTTC CiSCTKCTCG 

17050 17060 17070 17080 17090 17100 

CraCTGACT CGCT6C6CTC 66TCGTTC66 CT6C66CGA6 CGGTATaGC TaCTCAAAG 

17U0 17120 17130 17140 17150 17160 

GC6GTAATAC GGTTATCaC A6AATCA666 GATAACGCAG GAAAGAACAT GT6A6CAAAA 

17170 17180 17190 17200 17210 17220 

GGCaGCAAA AGGCaCGAA CCGTAAAAA6 6CCGC6TT6C TGGCGTTTTT CaTAGGCTC 

17230 17240 17250 17260 17270 17280 

CGCCCCCCT6 ACGACaTCA CAAAAATCGA CGCTCAACTC AGA6GTGGCG AAACCCGAa 

17290 17300 17310 17320 17330 17340 

GGACTATAAA 6ATACCA6GC CTTTCCCCCT 6GAA6CTCCC TC6T6CGCTC TCCTGTTCCa 

17350 17360 17370 17380 17390 17400 

ACCCT6CCCC TTACCGGATA CCT6TCCGCC TTTCTCCCTT C666AA6CGT G6CCCTTTCT 

17410 17420 17430 17440 17450 17460 

aTAGCraC GCTGTAGGTA TCTCAGTTCG GTGTAGGTCG TTCGCTCCAA GCTGG6CT6T 

17470 17480 17490 17500 17510 17520 

GTCaCGAAC. CCCCC6TTCA 6CCCGACC6C TGCGCCTTAT CCGGTAACTA TCGTCTTGAG 

17530 17540 17550 17560 17570 17580 

TCCAACCC66 TAAGAaCGA CTTATCGCCA CTGGaGaG CaCTGGTAA aGGATTAGC 

17590 17600 17610 17620 17630 17640 

AGA6CCA66T ATGTAG6C66 T6CTACA6AG TTCTTGAAGT 6GT6GCCTAA CTACGGCmC ; 

17650 17660 17670 17680 17690 17700 

ACTAGAA6GA aGTATTTGG TATCTGCGCT CTGCTGAA6C CAGTTACCTT CGGAAAAAGA 

17710 17720 17730 17740 17750 17760 

GTT6GTAGCT CTTGATCCGG CAAACAAACC ACCGCTGGTA GCGGT6GTTT .TTTTGTTTGC 

17770 17780 17790 17800 17810 17820 

AA6CA6CA6A 1TAC6C6CA6 AAAAAAAGGA TCTCAA6AAG ATCCTTTGAT CTTTTCTACG 

17830 17840 17850 17860 17870 17880 

GGGTCTGACG CTCAGTGCAA CGAAAACTCA CGTIAAGGGA ITTTGGTaT GAGATTATCA 

17890 17900 17910 17920 17930 17940 
AAAAGCATCT TaCCIAGAT CCTTTTAAAT TAAAAAT6U 6TTTTUATC AATCTAAAGT 

17950 17960 17970 17980 17990 18000 

ATATATGAGT AAACTTGGTC TGACAGTTAC CAATGOIAA TCA6T6A66C ACCTATCTCA 

18010 18020 18030 18040 18050 18060 

6C6ATCT6TC TATTTCGTTC ATCaTAGTT GCCT6ACTCC CCGTCGTGTA GATAACTACG 

18070 18080 18090 18100 18U0 18120 

AlACGGGAGG GCTTACCATC TGGCCCCAGT GCT6CAATGA TACCGCGAGA CCCACGCTCA 

18130 18140 18150 18160 18170 18180 

CCGGCTCaG ATTTATCAGC AATAAACCA6 CCAGCCG6AA GGGCC6AGC6 CAGAAGTG6T 



18190 18200 18210 18220 18230 18240 
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CdrCCAAC^TATCCGCCTC aTCCACTCT ATTAATTCTT 6CC66GAAGC TA6A6TAA6T 



U2S0 18260 UZTO 18280 18290 18300 

A6TTCGCCAG TTAATAGTTT GC6CAACGTT GTTGCaTTG CTGaCGCAT CGTGGTCTa 

18310 18320 18330 18340 18350 18360 

C6CTCGTCGT TTG6TAT6GC TTCATTCAGC TCC6GTTCCC AACGATCAAG GCGA6TTACA 

18370 18380 18390 18400 18410 18420 

TGATCCCCCA TGTT6T6CAA AAAAGCGGTT AGCTCCTTC6 6TCCTCCGAT CGTTGTaCA 

18430 18440 18450 18460 18470 

AGTAAGTTGG CC6CAGTGTT ATaCTCATG GTTATG6CA6 aCTGCATAA TTCTCTTICT 

18490 18500 18510 18520 18530 18540 

GTaTGCCAT CCGTAAGATG CmTCTGTG ACTCCT6ACT ACTCUCCAA 6TCATTCT6A 

18550 18560 18570 18580 18590 18600 

6AATAGTGTA TGC66C6ACC GA6TT6CTCT TGCCC6GC6T CAAaCGGGA TUTACC6C6 

U610 18620 18630 18640 18650 18660 

cCAaTAGCA GAACTTTAAA AGTGCTCATC ATT66AAAAC GTTCTTC6GG GCGAAAACTC 

18670 18680 18690 18700 18710 18720 

TCAAGGATCT TACCGCTGTT CAGATCCA6T TCGATGTAAC CCACTCGT6C ACCCAACTGA 

18730 18740 18750 18760 18770 18780 

TCTTCAGCAT CTTTTACTTT acaCCGTT TCTGGGTGA6 CAAAAACAGG AAGGCAAAAT 

18790 18800 18810 18820 18830 18840 

GCC6CAAAAA AGG6AATAAG GGCGACACGG AAATGTTGAA TACTCATACT CTTCmiii 

18850 18860 18870 18880 18890 18900 

CAATATTATT 6AAGCATTTA TaCGGTTAT TGTCTCATGA GCGGATACAT ATTT6UT6T 

18910 18920 18930 18940 18950 «960 

ATTTAGAAAA ATAAACAAAT A666GTTCC6 CGaCATTTC CCC6AAAA6T GCaCCTGAC 

18970 18980 18990 19000 19010 19«2« 

GTCTAA6AAA CCATTATTAT CATGACATTA ACCTATAAAA ATA6CC6TAT CACGAG6CCC 



19030 19040 19050 19060 19070 19080 
TTTCGTCTTC AA6AA 
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Title of the Invention 



METHOD FOR INTEGRATING GENES AT SPECIFIC SITES IN MAMMALIAN CELLS VIA 
HOMOLOGOUS RECOMBINATION AND VECTORS FOR ACCOMPLISHING THE SAME 



Field of the Invention 

The present invention relates to a process of tar- 
geting the integration of a desired exogenous DNA to a 
specific location within the genome of a mammalian cell. 

10 More specifically, the invention describes a novel meth- 
od for identifying a transcriptionally active target 
site ("hot spot") in the mammalian genome, and inserting 
a desired DNA at this site via homologous recombination. 
The invention also optionally provides the ability for 

15 gene amplification of the desired. DNA at this location 
by CO- integrating an amplifiable selectable marker, 
e.g., DHFR, in combination with the exogenous DNA. The 
invention additionally describes the construction of 
novel vectors suitable for accomplishing the above, and 

20 further provides mammalian cell lines produced by such 

methods which contain a desired exogenous DNA integrated 
at a target hot spot. 
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Background 

Technology for expressing recombinant proteins in 
both prokaryotic and eukaryotic organisms is well estab- 
lished. Mammalian cells offer significant advantages 
5 over bacteria or yeast for protein production, resulting 
from their ability to correctly assemble, glycosylate 
and post-translationally modify recombinantly expressed 
proteins. After transfection into the host cells, 
recombinant e3q>ression constructs can be maintained as 

10 extrachromosomal elements, or may be integrated into the 
host cell genome. Generation of stably transfected 
ma:mmalian cell lines usually involves the latter; a DNA 
construct encoding a gene of interest along with a drug 
resistance gene (dominant selectable marker) is intro- 

15 duced into the host cell, and subsequent growth in the 
presence of the drug allows for the selection of cells 
that have successfully integrated the exogenous DNA. In 
many instances, the gene of interest is linked to a drug 
resistant selectable marker which can later be subjected 

20 to gene amplification. The gene encoding dihydrof olate 
reductase (DHFR) is most commonly used for this purpose. 
Growth of cells in the presence of methotrexate, a com- 
petitive inhibitor of DHFR, leads to increased DHFR 
production by means of amplification of the DHFR gene. 

25 As flanking regions of DNA will also become amplified, 
the resultant coamplif ication of a DHFR linked gene in 
the transfected cell line can lead to increased protein 
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production, thereby resulting in high level expression 
of the gene of interest. 

While this approach has proven successful, there 
are a number of problems with the system because of the 
5 random nature of the integration event. These problems 
exist because expression levels are greatly influenced 
by the effects of the local genetic environment at the 
gene locus, a phenomena well documented in the litera- 
ture and generally referred to as "position effects" 

10 (for example, see Al-Shawi et al, Mol. Cell, Biol., 

10:1192-1198 (1990); Yoshimura et al, Afol. Cell. Biol., 
7:1296-1299 (1987)). As the vast majority of mammalian 
DNA is in a transcriptionally inactive state, random 
integration methods offer no control over the 

15 transcriptional fate of the integrated DNA. 

Consequently, wide variations in the expression level 
of integrated genes can occur, depending on the site of 
integration. For exaitple, integration of exogenous DNA 
into inactive, or transcriptionally "silent" regions of 

20 the genome will result in little or no expression. By 
contrast integration into a transcriptionally active 
site may result in high expression. 

Therefore, when the goal of the work is to obtain a 
high level of gene expression, as is typically the de- 

25 sired outcome of genetic engineering methods, it is 

generally necessary to screen large numbers of transfec- 
tants to find such a high producing clone. 
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Additionally, random integration of exogenous DNA into 
the genome can in some instances disrupt important 
cellular genes, resulting in an altered phenotype. 
These factors can make the generation of high expressing 
5 stable mammalian cell lines a complicated and laborious 
process . 

Recently, our laboratory has described the use of 
DNA vectors containing translationally impaired dominant 
selectable markers in mammalian gene expression. (This 

10 is disclosed in U.S. Serial No. 08/147,696 filed Novem- . 
ber 3, 1993, recently allowed) . 

These vectors contain a translationally impaired 
neomycin phosphotransferase (neo) gene as the dominant 
selectable marker, artificially engineered to contain an 

15 intron into which a DHFR gene along with a gene or genes . 
of interest is inserted. Use of these vectors as ex- 
pression constructs has been found to significantly 
reduce the total number of drug resistant colonies pro- 
duced, thereby facilitating the screening procedure in 

20 relation to conventional mammalian expression vectors. 
Furthermore, a significant percentage of the clones 
obtained using this system are high expressing clones. 
These results are apparently attributable to the 
modifications made to the neo selectable marker. Due to 

.25 the translational impairment of the neo gene, 

transfected cells will not produce enough neo protein to 
survive drug selection, thereby decreasing the overall 
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number of drug resistant colonies. Additionally, a 
higher percentage of the surviving clones will contain 
the expression vector integrated into sites in the 
genome where basal transcription levjsls are high, 
5 resulting in overproduction of neo, thereby allowing the 
cells to overcome the impairment of the neo gene. 
. Concomitantly, the genes of interest linked to neo will 
be subject to similar elevated levels of transcription. 
This same advantage is also true as' a result of the 

10 artificial intron created within neo; survival is 

dependent on the synthesis of a functional neo gene, 
which is in turn dependent on correct and efficient 
splicing of the neo introns. Moreover, these criteria 
.are more likely to be met if the vector DNA has 

15 integrated into a region which is already highly 
transcriptionally active. 

Following integration of the vector into a tran- 
scriptionally active region, gene amplification is per- 
formed by selection for the DHFR gene. Using this sys- 

20 tern, it has been possible to obtain clones selected 

using low levels of methotrexate (50nM) , containing few 
(<10) copies of the vector which secrete high levels of 
protein (>55pg/cell/day) . Furthermore, this can be 
achieved in a relatively short period of time. However, 

25 the success in amplification is variable. Some 

transcriptionally active sites cannot be amplified and 
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therefore the frequency and extent of amplification from 
a particular site is not predictable. 

Overall, the use of these translationally impaired 
vectors represents a significant improvement over other 
5 methods of random integration. However, as discussed, 

the problem of lack of control over the integration site 
remains a significant concern. 

One approach to overcome the problems of random 
integration is by means of gene targeting, whereby the . 

10 exogenous DNA is directed to a specific locus within the 
host genome. The exogenous DNA is inserted by means of 
homologous recombination occurring between sequences of 
DNA in the expression vector and the corresponding ho- 
mologous sequence in the genome. However, while this 

15 type of recombination occurs at a high frequency natu- 
rally in yeast and other fungal organisms, in higher 
eukaryotic organisms it is an extremely rare event. In 
mammalian cells, the frequency of homologous versus non- 
homologous (random integration) recombination is report - 

20 ed to range from l/lOO to 1/5000 (for example, see 
Capecchi, Science, 244:1288-1292 (1989); Morrow and 
Kucherlapati, Curr. Op. Biotech., 4:577-582 (1993)). 

One of the earliest reports describing homologous 
recombination in mammalian cells comprised an artificial 

25 system created in mouse fibroblasts (Thomas et al. Cell, 
44:419-428 (1986)). A cell line containing a mutated, 
non- functional version of the neo gene integrated into 
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the host genome was created, and subsequently targeted 
with a second non- functional copy of neo containing a 
different mutation. Reconstruction of a functional neo 
gene could occur only by gene targeting. Homologous 
5 , recombinants were identified by selecting for G418 

resistant cells, and confirmed by analysis of genomic 
DNA isolated from the resistant clones. 

Recently, the use of homologous recombination to 
.replace the heavy and light immunoglobulin genes at 

10 -endogenous loci in antibody secreting cells has been 
reported. (U.S. Patent No. 5,202,238, Fell et al, 
(1993).) However, this particular approach is not 
widely applicable, because it is limited to the 
production of immunoglobulins in cells which 

15 endogenously express immunoglobulins, e.g., B cells and 
myeloma cells. Also, expression is limited to single 
copy gene levels because co-amplification after 
homologous recombination is not included. The method is 
further complicated by the fact that two separate 

20 integration events are required to produce a functional 
immunoglobulin: one for the light chain gene followed by 
one for the heavy chain gene. 

An additional example of this type of system has 
been reported in NS/0 cells, where recombinant 

25 immunoglobulins are expressed by homologous 

recombination into the immunoglobulin gamma 2 A locus 
(Hollis et al, international patent application # 
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PCT/IB95 (00014).) Expression levels obtained from this 
site were extremely high - on the order of 20pg/cell/day 
from a single copy integrant. However, as in the above 
example, expression is limited to this level because an- 
5 amplifiable gene is not contegrated in this system. ; 
Also, other researchers have reported aberrant 
glycosylation of recombinant proteins expressed in NS/0 
cells (for example, see Flesher et al, Biotech, and 
Bioeng., 48:399-407 (1995)), thereby limiting the 

10 applicability of. this approach. 

The cre-loxP recombination system from 
bacteriophage PI has. recently been adapted and used as a 
means of gene targeting in eukaryotic cells. 
Specifically, the site specific integration of exogenous 

15 DNA into the Chinese hamster ovary (CHO) cell genome 
using ere recombinase and a series of lox containing 
vectors have been described. (Fukushige and Sauer, 
Proc. Natl. Acad. Sci. USA, 89:7905-7909 (1992).) This 
system is attractive in that it provides for 

20 reproducible expression at the same chromosomal 

location. However, no effort was made to identify a 
chromosomal site from which gene expression is optimal, 
and as in the above example, expression is limited to 
single copy levels in this system. Also, it is 

25 complicated by the fact that one needs to provide for 
expression of a functional recombinase enzyme in the 
mammalian cell. 
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The use of homologous recombination between an 
introduced DNA sequence and its endogenous chromosomal 
locus has also been reported to provide a useful means 
of genetic manipulation in mammalian cells, as well as 
5 in yeast cells. (See e.g., Bradley et al, Meth. , 
Enzymol., 223:855-879 (1993) ; Capecchi, Science, 
244:1288-1292 (1989); Rothstein et al, Meth. Enzymol., 
194:281-301 (1991)). To date, most mammalian gene 
targeting studies have been directed toward gene . 

10 disruption ("knockout"): or site-specific mutagenesis of 
selected target gene loci in mouse embryonic stem (ES) 
cells. The creation of these "knockout" mouse models 
has enabled scientists to examine specific 
structure-function issues and examine the biological 

15 importance of a myriad of mouse genes. This field of 
research also has important implications in terms . of 
potential gene therapy applications. 

Also, vectors have recently been reported by Cell- 
tech (Kent, U.K.) which purportedly are targeted to 

20 transcriptionally active sites in NSO cells, which do 
not require gene amplification (Peakman et al. Hum. 
Antibod. Hyhridomas, 5:65-74 (1994)). However, levels 
of immunoglobulin secretion in these unamplified cells 
have not been reported to exceed 20pg/cell/day, while in 

25 amplified CHO cells, levels as high as lOOpg/cell/day 
can be obtained ( Id. ) . 
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It would be highly desirable to develop a gene 
targeting system which reproducibly provided for the 
integration of exogenous DNA into a predetermined site 
in the genome known to be transcriptionally active. 
.5 Also, it would be desirable if such a gene targeting 

system would further facilitate co-amplification of the 
inserted DNA after integration. The design of such a 
system would allow for the reproducible and high level 
eacpression of any cloned gene of interest in a mammalian 

10 cell, and xindoubtedly would be of significant interest 
to many researchers. 

In this application, we provide a novel mammalian 
expression system, based on homologous recombination 
occurring between two artificial substrates contained in 

15 two different vectors. Specifically, this system uses a 
combination of two novel mammalian expression vectors, 
referred to as a "marking" vector and a "targeting" 
vector . 

Essentially, the marking vector enables the identi- 
20 fication and marking of a site in the mammalian genome 

which is transcriptionally active, i.e., a site at which 
gene e^^ression levels are high. This site can be 
regarded as a "hot spot" in the genome. After integra- 
tion of the marking vector, the subject expression sys- 
25 tem enables another DNA to be integrated at this site, 
i.e., the targeting vector, by means of homologous 
recombination occurring between DNA sequences common to 
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both vectors. This system affords significant 
advantages over other homologous recombination systems. 

Unlike most other homologous systems employed in 
mammalian cells, this system exhibits no background. 
5 Therefore, cells which have only undergone random inte- 
gration of the vector do not survive the selection. 
Thus, any gene of interest cloned into the targeting 
plasmid is expressed at high levels from the marked hot 
spot- Accordingly, the subject method of gene expres- 

10 sion substantially or completely eliminates the problems 
inherent to systems of random integration, discussed in 
detail above. Moreover, this system provides reproduc- 
ible and high level expression of any recombinant pro- 
tein at the same transcriptionally active site in the 

15 . mammalian genome. In addition, gene amplification may 
be effected at this particular transcriptionally active 
site by including an amplifiable dominant selectable 
mairker (e.g. DHFR) as part of the marking vector. 

Objects of the Invention 

20 Thus, it is an object of the invention to provide 

an improved method for targeting a desired DNA to a 
specific site in a mammalian cell. 

It is a more specific object of the invention to 
provide a novel method for targeting a desired DNA to a 

25 specific site in a mammalian cell via homologous recom- 
bination. 
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It is another specific object of the invention to 
provide novel vectors for achieving site specific inte- 
gration of a desired DNA in a manunalian cell. 

It is still another object of the invention to 
5. provide novel maramalian cell lines which contain a de- 
sired DNA integrated at a predetermined site which pro-, 
vides for high expression. 

It is a more specific object of the invention to 
provide a novel, method for achieving site specific inte- 
10 gration of a desired DNA in a Chinese hamster ovary 
(CHO) cell. 

It is another more specific object of the invention 
to provide a novel method for integrating immunoglobulin 
genes, or any other genes, in mammalian cells at 
15 predetermined chromosomal sites that provide for high 
expression. 

It is another specific object of the invention to 
provide novel vectors and vector combinations suitable 
for integrating immunoglobulin genes into mammalian 
20 cells at predetermined sites that provide for high ex- 
pression. 

It is another object of the invention to provide 
mammalian cell lines which contain immunoglobulin genes 
integrated at predetermined, sites that provide for high 
25 expression. 

It is an even more specific object of the invention 
to provide a novel .method for integrating immunoglobulin 
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genes into C3I0 cells that provide for high expression, 
as well as novel vectors and vector combinations that 
provide for such integration of immunoglobulin genes 
into CHO cells. 

5 In addition, it is a specific object of the inven- 

tion to provide novel CHO cell lines which contain immu- 
noglobulin genes integrated at predetermined sites that 
provide for high expression, and have been amplified by 
.methotrexate selection to secrete even greater amounts 
10 of functional immunoglobulins. 

Brief Descript in n of th^ FioiirPR 

Figure 1 depicts a map of a marking plasmid accord- 
ing to the invention referred to as Desmond. The plas- 
mid is shown in circular form (la) as well as a 
15 linearized version used for transfection (lb) . 

Figure 2(a) shows a map of a targeting plasmid 
referred to "Molly". Molly is shown here encoding the 
anti-CD20 immunoglobulin genes, expression of which is 
described in Exanple 1. 
20 Figure 2 (b) shows a linearized version of Molly, 

after digestion with the restriction enzymes Kpnl and 
Pad. This linearized form was used for transfection. 

Figure 3 depicts the potential alignment between 
Desmond sequences integrated into the CHO genome, and 
25 incoming targeting Molly sequences. One potential ar- 



wo 98/41645 



PCT/US98/03935 




- 14 



rangement of Molly integrated into Desmond after homolo- 
gous recombination is also presented. 

Figure 4 shows a Southern analysis of single copy 
Desmond clones • Samples are as follows: 
5 Lane 1: XHindlll DNA size marker 
Lane 2: Desmond clone 10F3 
Lane 3 : Desmond clone 10C12 
Lane 4 : Desmond clone 15C9 
Lane 5: Desmond clone 14B5 
10 Lane 6: Desmond clone 9B2 

Figure 5 shows a Northern analysis of single copy 
Desmond clones. Samples are as follows: Panel A: 
. northern probed with CAD and DHFR probes, as indicated 
on the figure. Panel B: duplicate northern, probed with 
15 CAD and HisD probes, as indicated. The RNA samples 
loaded in panels A and B are as follows: 
Lane 1: clone 9B2, lane 2; clone 10C12, lane 3; clone 
14B5, lane 4; clone 15C9, lane 5; control RNA from CHO 
transfected with a HisD and DHFR containing plasmid, 
20 lane 6; untransf ected CHO. 

Figure 6 shows a Southern analysis of clones 
resulting from the homologous integration of Molly into 
Desmond. Samples are as follows: 

Lane 1: AHindlll DNA size markers. Lane 2: 20F4, lane 3; 
25 5F9, lane 4; 21C7, lane 5; 24G2, lane 6; 25E1, lane 7; 
28C9, lane 8; 29F9, lane 9; 39G11, lane 10; 42F9, lane 
11; 50G10, lane 12; Molly plasmid DNA, linearized with 
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BgllKtop band) and cut with Bglll and Kpnl (lower 
band), lane 13; untransfected Desmond. 

Figures 7A through 7G contain the Sequence Listing 
for Desmond. 

Figures 8A through 81 contain the Sequence Listing 
for Molly- containing anti-CD20. 

Figure 9 contains a map of the targeting plasmid, 
: "Mandy, " shown here encoding anti-CD23 genes, the 
expression of which is disclosed in Example 5. 

Figures lOA through ION contain the sequence 
listing of "Mandy" containing the anti-CD23 genes as 
disclosed in Example 5. 

Detailed Descrintion of ^ h e inv«:>nHor, 

The invention provides a novel method for integrat- 
ing a desired exogenous DNA at a target site within the 
genome of a mammalian cell via homologous recombination. 
Also, the invention provides novel vectors for achieving 
the site specific integration of a DNA at a target site 
in the genome of a mammalian cell. 

More specifically, the subject cloning method pro- 
vides for site specific integration of a desired DNA in 
a mammalian cell by transfection of such cell with a 
"marker plasmid" which contains a unique sequence that 
is foreign to the mammalian cell genome and which 
provides a substrate for homologous recombination, fol- 
lowed by transfection with a "target plasmid" containing 
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a sequence which provides for homologous recombination 
with the unique sequence contained in the marker 
plasmid, and further comprising a desired DNA that is to 
be integrated into the mammalian cell. Typically, the 
5 integrated DNA will encode a protein of interest, such 
as an immunoglobulin or other secreted mammalian 
glycoprotein . 

The exemplified homologous recombination system 
uses the neomycin phosphotransferase gene as a dominant 
10 selectable marker. This particular marker was utilized 
based on the following previously published observa- 
tions; 

(i) the demonstrated ability to target and restore 
function to a mutated version of the neo gene (cited 

15 earlier) and 

(ii) our development of translationally impaired 
expression vectors, in which the neo gene has been arti- 
ficially created as two exons with a gene of interest 
inserted in the intervening intron; neo exons are cor- 

20 rectly spliced and translated in vivo, producing a func- 
tional protein and thereby conferring G418 resistance on 
the resultant cell population. In this application, the 
neo gene is split into three exons. The third exon of 
neo is present on the "marker" plasmid and becomes inte- 

25 grated into the host cell genome upon integration of the 
marker plasmid into the mammalian cells. Exons 1 and 2 
are present on the targeting plasmid, and are separated 
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by an intervening intron into which at least one gene of 
interest is cloned. Homologous recombination of the 
targeting vector with the integrated marking vector 
results in correct splicing of all three exons of the 

5 . . neo gene and thereby expression of a functional neo 

protein (as determined by selection for G418 resistant 
colonies) . Prior to designing the current expression 
system, we had experimentally tested the functionality 
of such a triply spliced neo construct in mammalian 

0 cells. The results of this control experiment indicated 
that all three neo exons were properly spliced and 
therefore suggested the feasibility of the subject 
invention. 

However, while the present invention is exemplified 
5 using the neo gene, and more specifically a triple split 
neo gene, the general methodology should be efficacious 
with other dominant selectable markers. 

As discussed in greater detail infra, the present 
invention affords numerous advantages to conventional 
3 gene expression methods, including both random integra- 
tion and gene targeting methods. Specifically, the 
subject invention provides a method which reproducibly 
allows for site-specific integration of a desired DNA 
into a transcriptionally active domain of a mammalian 
> cell. Moreover, because the subject method introduces 
an artificial region of "homology" which acts as a 
unique substrate for homologous recombination and the 
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insertion of a desired DNA, the efficacy of subject 
invention does not require that the cell endogenously 
contain or e3q)ress a specific DNA. Thus, the method is 
generically applicable to all mammalian cells, and can 
5 be used to express any type of recombinant protein. 

The use of a triply spliced selectable marker,, 
e.g., the exemplified triply spliced neo construct, 
guarantees that all G418 resistant colonies produced 
will arise from a homologous recombination event (random 

10 integrants will not produce a functional neo gene and 
consequently will not survive G418 selection). Thus, 
the subject invention makes it easy to screen for. the 
desired homologous event. Furthermore, the frequency of 
additional random integrations in a cell that has under- 

15 gone a homologous recombination event appears to be low. 

Based on the foregoing, it is apparent that a sig- 
nificant advantage of the invention is that it substan- 
tially reduces the number of colonies that need be 
screened to identify high producer clones, i.e.,- cell 

20 lines containing a desired DNA which secrete the corre- 
sponding protein at high levels. On average, clones 
containing integrated desired DNA may be identified by 
screening about 5 to 20 colonies (compared to several 
thousand which must be screened when using standard 

25 random integration techniques, or several hundred using 
the previously described intronic insertion vectors) 
Additionally, as the site of integration was preselected 
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and comprises a transcriptionally active domain, all 
exogenous DNA. expressed at this site should produce 
conparable, i.e. high levels of the protein of interest. 

Moreover, the stibject invention is further advanta- 
geous in that it enables an amplifiable gene to be 
inserted on integration of the marking vector. Thus, 
when a desired gene is targeted to this site via 
homologous recombination, the subject invention allows 
for expression of the gene to be further enhanced by 
gene amplification. In this regard, it has been 
reported in from the literature that different genomic 
sites have different capacities^ for gene amplification 
(Meinkoth et al, Mol. Cell Biol., 7:1415-1424 (1987)). 
Therefore, this technique is further advantageous as it 
15 allows for the placement of a desired gene of interest 

at a specific site that is both transcriptionally active 
and easily amplified. Therefore, this should signifi- 
cantly reduce the amount of time required to isolate 
such high producers. 
20 Specifically, while conventional methods for the 

construction of high expressing mammalian cell lines can 
take 6 to 9 months, the present invention allows for 
such clones to be isolated on average after only about 
3-6 months. This is due to the fact that conventionally 
25 isolated clones typically must be subjected to at least 
three rounds of drug resistant gene amplification in 
order to reach satisfactory levels of gene expression. 
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As the homologously produced clones are generated from a 
preselected site which is a high ea^ression site, fewer 
roimds of amplification should be required before reach- 
ing a satisfactory level of production. 
5 Still further, the subject invention enables the 

reproducible selection of high producer clones wherein 
the vector is integrated at low copy number, typically 
single copy. This is advantageous as it enhances the 
stability of the clones and avoids other potential ad- 
10 verse side-effects associated with high copy number. As 
described supra, the subject homologous, recombination 
system uses the combination of a "marker plasmid" and a 
"targeting plasmid" which are described in more detail 
below. 

15 The "marker plasmid" which is used to mark and 

identify a transcriptionally hot spot will conprise at 
least the following sequences: 

(i) a region of DNA that is heterologous or unique 
to the genome of the mammalian cell, which functions as 

20 a source of homology, allows for homologous recombina- 
tion (with a DNA contained in a second target plasmid) . 
More specifically, the unique region of DNA (i) will 
generally comprise a bacterial, viral, yeast synthetic, 
or other DNA which is not normally present in the 

25 mammalian cell genome and which further does not 

comprise significant homology or sequence identity to 
DNA contained in the genome of the mammalian cell. 
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Essentially, this sequence should be sufficiently 
different to mammalian DNA that it will not signifi- 
cantly recombine with the host cell genome via 
homologous recombination. The size of such unique DNA 
5 will geneirally be at least about 2 to 10 kilobases in 

size, or higher, more preferably at least about lOkb, as 
several other investigators have noted an increased 
frequency of targeted recombination as the size of the 
homology region is increased (Capecchi, Science, 

10 244:1288-1292 (1989) ) . 

The upper size limit of the unique DNA which acts 
as a Efite for homologous recombination with a sequence: . . 
in the second target vector is largely dictated by po- 
tential stability constraints (if DNA is too large it 

15 may not be easily integrated into a chromosome and the 
difficulties in working with very large DNAs. 

(ii) a DNA including a fragment of a selectable 
marker DNA, typically an exon of a dominant selectable 
marker gene. The only essential feature of this DNA is 

20 that it not encode a functional selectable marker pro- 
tein unless it is expressed in association with a se- 
quence contained in the target plasmid. Typically, the 
target plasmid will comprise the remaining exons of the 
dominant selectable marker gene (those not comprised in 

25 "targeting" plasmid) . Essentially, a functional 

selectable marker should only be produced if homologous 
recombination occurs (resulting in the association and 
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expression of this marker DNA (i) sequence together with 
the portion (s) of the selectable marker DNA fragment 
which is (are) contained in the target plasmid) . 

As noted, the current invention exemplifies the 
S use of the neomycin phosphotransferase gene as the domi- 
nant selectable marker which is "split" in the two vec- 
tors. However, other selectable markers should also be 
suitable, e.g., the Salmonella histidinol dehydrogenase ,. 
gene, hygromycin phosphotransferase gene, herpes simplex 

10 virus thymidine kinase gene, adenosine deaminase gene, 
glutamine synthetase gene and hypoxanthine- guanine 
phosphoribosyl transferase gene . 

(iii) a DNA which encodes a functional selectable 
marker protein, which selectable marker is different 

15 from the selectable marker DNA (ii) . This selectable 

marker provides for the successful selection of mammali- 
an cells wherein the marker plasmid is successfully 
integrated into the cellular DNA. More preferably, it 
is desirable that the marker plasmid comprise two such 

.20 dominant selectable marker DNAs, situated at opposite 

ends of the vector. This is advantageous as it enables 
integrants to be selected using different selection 
agents and further enables cells which contain the en- 
tire vector to be selected. Additionally, one marker 

25 can be an amplifiable marker to facilitate gene 

amplification as discussed previously- Any of the 
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dominant selectable marker listed in (ii) can be used as 
well as others generally known in the art. 

Moreover, the marker plasmid may optionally further 
comprise a rare endonuclease restriction site. This is 
5 potentially desirable as this may facilitate cleavage. 
- If present, such rare restriction site should be situat- 
ed close to the middle of the unique region that acts as 
a substrate for homologous recombination. Preferably 
such sequence will be at least about 12 nucleotides. 
..10 The introduction of a double stranded break by similar 
methodology has been reported to enhance the frequency- 
of homologous recombination. (Choulika et al, Afol. 
Cell. Biol., 15:1968-1973 (1995)). However, the 
presence of such sec[uence is not essential. 

The "targeting plasmid" will comprise at least the 
following sequences: 

(1) the same unique region of DNA contained in the 
marker plasmid or one having sufficient homology or 
sequence identity therewith that said DNA is capable of 
combining via homologous recombination with the unique 
region (i) in the marker plasmid,. Suitable types of 
DNAs are described supra in the description of the 
unique region of DNA (1) in the marker plasmid. 

(2) The remaining exons of the dominant selectable 
marker, one exon of which is included as (ii) in the 
marker plasmid listed above. The essential features of 
this DNA fragment is that it result in a functional 
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(selectable) marker protein only if the target plasmid 
integrates via homologous recombination (wherein such 
recombination results in the association of this DNA 
with the other fragment of the selectable marker DNA 
contained in the marker plasmid) and further that it 
allow for insertion of a desired exogenous DNA. Typi- 
cally, this DNA will comprise the remaining exons of the 
selectable marker DNA which are separated by an intron. 
For example, this DNA may comprise the first two exons 
of the neo gene and the marker plasmid may comprise the 
third exon (back third of neo) . 

(3) The target plasmid will also comprise a de- 
sired DNA, e.g., one encoding a desired polypeptide, 
preferably inserted within the selectable marker DNA 
15 fragment contained in the plasmid. Typically, the DNA 

will be inserted in an intron which is comprised between 
the exons of the selectable marker DNA. This ensures 
that the desired DNA is also integrated if homologous 
recombination of the target plasmid and the marker plas- 
20 mid occurs. This intron may be naturally occurring or 

it may be engineered into the dominant selectable marker 
DNA fragment . 

This DNA will encode any desired protein, 
preferably one having pharmaceutical or other desirable 
25 properties. Most typically the DNA will encode a 

mammalian protein, and in the current examples provided, 
an immunoglobulin or an immunoadhesin. However the 
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invention is not in any way limited to the production of 
itranvinoglobulins . 

As discussed previously, the subject cloning method 
is suitable for any manunalian cell as it does not re- 
5 quire for efficacy that any specific mammalian sequence 
or sequences be present. In general, such mammalian 
cells will comprise those typically used for protein 
expression, e.g., CHO cells, myeloma cells, COS cells, 
BHK cells, Sp2/0 cells, NIH 3T3 and HeLa cells. In the 

10 examples which follow, CHO cells were utilized. The 

advantages thereof include the availability of suitable 
growth medium, their ability to grow efficiently and to 
high density in culture, and their ability to express 
mammalian proteins such as immunoglobulins in biologi- 

15 cally active form. 

Further, CHO cells were selected in large part 
because of previous usage of such cells by the inventors 
for the expression of immunoglobulins (using the trans- 
lationally impaired dominant selectable marker contain- 
ing vectors described previously) . Thus, the present 
laboratory has considerable experience in using such 
cells for expression. However, based on the examples 
which follow, it is reasonable to expect similar results 
will be obtained with other mammalian cells. 

In general, transformation or transfection of mam- 
malian cells according to the subject invention will be 
effected according to conventional methods. So that the 
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invention may be better understood, the construction of 
exemplary vectors and their usage in producing inte- 
grants is described in the examples below. 

EXAMPLE 1 

5 Design and Preparation of Marker 

cuid Targeting Plasmid DNA Vectors 

The marker plasmid herein referred to as "Desmond" 
was assembled from the following DNA elements: 

(a) Murine dihvdrofolate rednrt-a ge aenf> /nwpp) 
10 incorporated into a transcription cassette, comprising 

the mouse beta globin promoter 5" to the DHFR start 
site, and bovine growth hormone poly adenylation signal 
3" to the stop codon. The DHFR transcriptional cassette 
was isolated from TCAE6, an expression vector created 
15 previously in this laboratory (Newman et al, 1992, Bio- 
technology, 10:1455-1460). 

(b) E. coli 6-galactosidaPiP> a^n^ - commercially 
available, obtained from Promega as pSV-b-galactosidase 
control vector, catalog # E1081. 

20 (c) Baculovirus DMA , commercially available, pur- 

chased from Clontech as pBAKPAKS, cat # 6145-1. 

(d) Cagsettg CPmPrising promoter and enhan cer el^- 
m^ntg from Cvtomegalovims an d SV40 vims. The cassette 
was generated by PGR using a derivative of expression 

25 vector TCAE8 (Reff et al. Blood, 83:435-445 (1994)). 
The enhancer cassette was inserted within the baculo- 
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virus sequence, which was first modified by the inser- 
tion of a multiple cloning site. 

<e) fi. COli GUS fqlucuronidasP^^ g^n^, commercially 
available, purchased from Clontech as pBlOl, cat. # 
6017-1. 

(f) Firefly lucifer-as<^ q'='n*'i commercially avail- 
able, obtained from Promega as pGEM-Luc (catalog # 
E1541) . 

S- tVPhimurium histidinol . dehYf^-rogenagf^ g e^p^ 
(HisD) . This gene was originally a gift from (Donahue 
et el. Gene, 18:47-59 (1982)), and has subsequently been 
incorporated into a transcription cassette comprising 
the mouse beta globin major promoter 5* to the gene, and 
the SV40 polyadenylation signal 3' to the gene. 

The DNA elements described in (a) - (g) were combined 
into a pBR derived plasmid backbone to produce a 7.7kb 
contiguous stretch of DNA referred to in the attached 
figures as "homology". Homology in this sense refers to 
sequences of DNA which are not part of the mammalian 
genome and are used to promote homologous recombination 
between transfected plasmids sharing the same homology 
DNA sequences. 

^h) Neomycin phosphotranfif «=>T -ase ap>ne from tvsr (Da- 
vis and Smith, Ann. Rev. Micro.. 32:469-518 (1978)). 
The complete neo gene was subcloned into pBluescript 
SK- (Stratagene catalog # 212205) to facilitate genetic 
manipulation. A synthetic linker was then inserted into 
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a xinique Pstl site occurring across the codons for amino 
acid 51 and 52 of neo. This linker encoded the neces- 
sary DNA elements to create an artificial splice donor 
site, intervening intron and splice acceptor site within 
5 the neo gene, thus creating two separate exons, present- 
ly referred to as neo exon 1 and 2. Neo exon 1 encodes 
the first 51 amino acids of neo, while exon 2 encodes 
the remaining 2 03 amino acids plus the stop codon of the 
protein A Notl cloning site was also created within the 
10 intron. 

Neo exon 2 was further subdivided to produce neo 
exons 2 and 3. This was achieved as follows: A set of 
PGR primers were designed to amplify a region of DNA 
encoding neo exon 1, intron and the first 111 2/3 amino 

15 acids of exon2 . The 3 ' PGR primer resulted in the . 

introduction of a new 5' splice site immediately after 
the second nucleotide of the codon for amino acid 111 in 
exon 2, therefore generating a new smaller exon 2. The 
DNA fragment now encoding the original exon 1, intron 

20 and new exon 2 was then subcloned and propagated in a 

pBR based vector. The remainder of the original exon 2 
was used as a template for another round of PGR 
amplification, which generated "exon3". The 5' primer 
for this round of amplification introduced a new splice 

25 acceptor site at the 5* side of the newly created exon 
3, i.e. before the final nucleotide of the codon for 
amino acid 111. The resultant 3 exons of neo encode the 
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following information: exon 1 - the first 51 amino acids 
of neo; exon 2 - the next 111 2/3 amino acids, and exon 
3 the final 91 1/3 amino acids plus the translational 
stop codon of the neo gene. 
5 Neo exon 3 was incorporated along with the above 

mentioned DNA elements into the marking plasmid 
"Desmond". Neo exons 1 and 2 were incorporated into the 
targeting plasmid "Molly", The Notl cloning site creat- 
ed within the intron between exons 1 and 2 was used in 

. 10 subsequent cloning steps to insert genes of interest 
into the targeting plasmid, 

A second targeting plasmid "Mandy" was also 
generated. This plasmid is almost identical to "Molly" 
(some restriction sites on the vector have been changed) 

15 except that the original HisD and DHFR genes contained ■ 
in "Molly" were inactivated. These changes were 
incorporated because the Desmond cell line was no longer 
being cultured in the presence of Histidinol, therefore 
it seemed xinnecessary to include a second copy of the 

20 HisD gene. Additionally, the DHFR gene was inactivated 
to ensure that only a single DHFR gene, namely the one 
present in the Desmond marked site, would be amplifiable 
in any resulting cell lines. "Mandy" was derived from 
"Molly" by the following modifications: 

25 (i) A synthetic linker was inserted in the middle 

of the DHFR coding region. This linker created a stop 
codon and shifted the remainder of the DHFR coding 
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region out of frame, therefore rendering the gene 
nonfunctional . 

(ii) A portion of the HisD gene was deleted and 
replaced with a PGR generated HisD fragment lacking the 
5 promoter and start codon of the gene. 

Figure 1 depicts the arrangement of these DNA ele- 
ments in the marker plasmid "Desmond" . Figure 2 depicts 
the arrangement of these elements in the first targeting 
plasmid, "Molly". Figure 3 illustrates the possible 

10 arrangement in the CHO genome, of the various DNA 

elements after targeting and integration of Molly DNA 
into Desmond marked CHO cells. Figure 9 depicts the 
targeting plasmid "Mandy." 

Construction of the marking and targeting plasmids 

15 from the above listed DNA elements was carried out fol- 
lowing conventional cloning techniques (see, e.g.. 
Molecular Cloning, A Laboratory Manual, J. Sambrook et 
al, 1987, Cold Spring Harbor Laboratory Press, and 
Current Protocols in Molecular Biology, F. M. Ausubel et 

20 al, eds., 1987, John Wiley and Sons). All plasmids were 
propagated and maintained in E. coli XLI blue 
(Stratagene, cat. # 200236) . Large scale plasmid 
preparations were prepared using Promega Wizard Maxiprep 
* DNA Purification System®, according to the 

25 manufacturer's directions. 
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EXAMPLE 2 

Construction of a Marked CHQ Cell Lin^ 

1* Cell Culture and Trans£ection Procedures to 
Produced Marked CHO Cell Line 

5 Marker plasmid DNA was linearized by digestion 

overnight at 37°C with Bstll07I. Linearized vector was 
ethanol precipitated and resuspended in sterile TE to a 
concentration of Img/ml . Linearized vector was intro- 
duced into DHFR-Chinese hamster ovary cells (CHO cells) 

10 DG44 cells (Urlaub et al, Som. Cell and Mol. Gen., 
12:555-566 (1986)) by electroporation as follows. 

Exponentially growing cells were harvested by cen- 
. trifugation, washed once in ice cold SBS (sucrose 
buffered solution, 272mM sucrose, 7mM sodium phosphate, 

15 pH 7.4, ImM magnesium chloride) then resuspended in SBS 
to a concentration of 10^ cells/ml. After a 15 minute 
incubation on ice, 0.4ml of the cell suspension was 
mixed with ^Ofj^g linearized DNA in a disposable 
electroporation cuvette. Cells were shocked using a BTX 

20 electrocell manipulator (San Diego, CA) set at 230 

volts, 400 microfaraday capacitance, 13 ohm resistance. 
Shocked cells were then mixed with 20 ml of prewarmed 
CHO growth media (CHO-S-SFMII , Gibco/BRL, catalog # 
31033-012) and plated in 96 well tissue culture plates. 

25 Forty eight hours after electroporation, plates were fed 
with selection media (in the case of transfection with 
Desmond, selection media is CHO-S-SFMII without 
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hypoxanthine or thymidine, supplemented with 2mM 
Histidinol (Sigma catalog # H6647) ) . Plates were main- 
tained in selection media for up to 30 days, or until 
some of the wells exhibited cell growth. These cells 
were then removed from the 96 well plates and expanded 
ultimately to 120 ml spinner flasks where they were 
maintained in selection media at all times. 

EXAMPLE 3 

Characterization of Marked cho gall T.-ir^ ^p 
(a) Southern Analysis 

Genomic DNA was isolated from all stably growing 
Desmond marked CHO cells. DNA was isolated using the 
Invitrogen Easy® DNA kit, according to the manufactur- 
er's directions. Genomic DNA was then digested with 
Hindlll overnight , at 37°C, and sxabjected to Southern 
analysis using. a PGR generated digoxygenin labelled 
probe specific to the DHFR gene. Hybridizations and 
washes were carried out using Boehringer Mannheim's DIG 
easy hyb (catalog # 1603 558) and DIG Wash and Block 
Buffer Set (catalog # 1585 762) according to the manu- 
facturer's directions. DNA samples containing a single 
band hybridizing to the DHFR probe were assumed to be 
Desmond clones arising from a single cell which had 
integrated a single copy of the plasmid. These clones 
were retained for further analysis. Out of a total of 
45 HisD resistant cell lines isolated, only 5 were 
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single copy integrants. Figure 4 shows a Southern blot 
containing all 5 of these single copy Desmond clones. 
Clone names are provided in the figure legend, 
(b) Northern Analysis 

Total RNA was isolated from all single copy Desmond- 
clones using TRIzol reagent (Gibco/BRL cat # 15596-026) 
according to the manufacturer's directions. 10-20yug RNA 
from each clone was analyzed on duplicate formaldehyde 
gels. The resulting blots were probed with PGR 
generated digoxygenin labelled DNA probes to (i) DHFR 
message, (ii) HisD message and (iii) CAD message. CAD 
is a trifunctional protein involved in uridine 
biosynthesis (Wahl et al, J. Biol. Chen?., 254, 17:8679- 
8689 (1979)), and is expressed equally in all cell 
types. It is used here as an internal control to help 
quant itate RNA loading. Hybridizations and washes were 
carried out using the above mentioned Boehringer 
Mannheim reagents. The results of the Northern analysis 
are shown in Figure 5. The single copy Desmond ■ clone 
exhibiting the highest levels of both the His D and DHFR 
message is clone 15C9, shown in lane 4 in both panels of 
the figure. This clone was designated as the "marked 
cell line" and used in future targeting experiments in 
CHO, examples of which are presented in the following 
sections. 
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EXAMPLE 4 

Expression of Anf -i-gD20 antibody 
in Desmond Mar ked CHO <7 «»l,lp 

C2B8, a chimeric antibody which recognizes B-cell 
surface antigen CD20, has been cloned and expressed 
previously in our laboratory. (Reff et al. Blood, 
83:434-45 (1994)). A 4.1 kb DNA fragment comprising the 
C2B8 light and heavy chain genes, along with the neces- 
sary regulatory elements {eukaryotic promoter and poly- 
adenylation signals) was inserted into the artificial 
intron created between exons 1 and 2 of the neo gene 
contained in a pBR derived cloning vector. This newly 
generated Skb DNA fragment (comprising neo exon 1, C2B8 
and neo exon 2) was excised and used to assemble the 
targeting plasmid Molly. The other DNA elements used in 
the construction of Molly are identical to those used to 
construct the marking plasmid Desmond, identified 
previously. A complete map of Molly is shown in Fig. 2. 

The targeting vector Molly was linearized prior to 
transfection by digestion with JCpnl and Pad, ethanol 
precipitated and resuspended in sterile TE to a concen- 
tration of 1.5mg/mL. Linearized plasmid was introduced 
into exponentially growing Desmond marked cells essen- 
tially as described, except that ao^g DNA was used in 
each electroporation. Forty eight hours postelectropo- 
ration, 96 well plates were supplemented with selection 
medium - CHO-SSFMII supplemented with 400 fxg/mL Genet i- 



wo 98/41645 



PCT/US98/03935 




- 35 - 



cin (G418, Gibco/BRL catalog # 10131-019). Plates were 
maintained in selection medium for up to 30 days, or 
until cell growth occurred in some of the wells. Such 
growth was assumed to be the result of clonal expansion 
of a single G418 resistant cell. The supernatants from 
all G418 resistant wells were assayed for C2B8 pro- 
duction by standard ELISA techniques, and all productive 
clones were eventually expanded to 120mL spinner flasks 
and further analyzed. 

Characterization of Antibody a e eretinfy Tagerafcad Cella 

A total of 50 electroporations with Molly targeting' 
plasmid were carried out in this experiment, each of 
which was plated into separate 96 well plates. A total 
of 10 viable, anti-CD20 antibody secreting clones were 
obtained and expanded to 120ml spinner flasks. Genomic 
DNA was isolated from all clones, and Southern analyses 
were subsequently performed to determine whether the 
clones represented single homologous recombination 
events or whether additional random integrations had 
occurred in the same cells. The methods for DNA isola- 
tion and Southern hybridization were as described in the 
previous section. Genomic DNA was digested with EcoRI 
and probed with a PGR generated digoxygenin labelled 
probe to a segment of the CD20 heavy chain constant 
region. The results of this Southern analysis are pre- 
sented in figure 6. As can be seen in the figure, 8 of 
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the 10 clones show a single band hybridizing to the CD20 
probe, indicating a single homologous recombination 
event has occurred in these cells. Two of the ten, 
clones 24G2 and 28C9, show the presence of additional 
band(s), indicative of an additional random integration 
elsewhere in the genome. 

We examined the expression levels of anti-CD20 
antibody in all ten of these clones, the data for which 
is shown in Table 1, below. 

Table 1 : 

Expression Level of Anti-CD20 
Secreting Homologous Integrants 

Clone Anti-rn^n . pg/r/ri 



20F4 


3.5 


25E1 


2.4 


42F9 


1.8 


39G11 


1.5 


21C7 


1.3 


50G10 


0.9 


29F9 


0.8 


5F9 


0.3 



28C9* 
24G2* 



4.5 
2.1 
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* These clones contained additional randomly- 
integrated copies of anti-CD20. Expression 
levels of these clones therefore reflect a 
contribution from both the homologous and ran- 
dom sites. 

E:q)ression levels are reported as picogram per cell per 
day (pg/c/d) secreted by. the individual clones , and 
represented the mean levels obtained from three separate 
ELISAs on samples taken from 120 mL spinner flasks. 

As can be seen from the data, there is a variation 
in antibody secretion of approximately ten fold between 
the highest and lowest clones. This was somewhat unex- 
pected as we anticipated similar expression levels from 
all clones due to the fact the anti-CD20 genes are all 
integrated into the same Desmond marked site. Neverthe- 
less, this observed range in expression extremely small 
in comparison to that seen using any traditional random 
integration method or with our translationally impaired 
vector system. 

Clone 20F4, the highest producing single copy inte- 
grant was selected for further study. Table 2 (below) 
presents ELISA and cell culture data from seven day 
production runs of this clone. 
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Table 2: 





7 Day 


Production 


Run Data 


for 20F4 




bay 


% viable 


Viable/ml 
(X 10') 


Tx2 (hr) 


mg/L 


pg/c/d 


1 


96 


3.4 


31 


1.3 


4.9 


2 


94 


G 


29 


2.5 


3.4 


3 


94 


9.9 


33 


4.7 


3.2 


4 


90 


17.4 


30 


6.8 


3 


5 


73 


14 




8.3 




6 


17 


3.5 




9.5 





10 Clone 20F4 was seeded at 2x10^1 in a 120ml spinner 

flask on day 0. On the following six days, cell counts 
were taken, doubling times calculated and 1ml samples 
of supernatant removed from the flask and analyzed for 
secreted anti-CD20 by ELISA. 

15 This clone is secreting on average, 3-5pg antibody/- 
cell/day, based on this ELISA data. This is. the same 
level as obtained from. other high expressing single copy 
clones obtained previously in our laboratory using the 
previously developed translationally impaired random 

20 integration vectors. This result indicates the follow- 
ing: 

(1) that the site in the CHO genome marked by the 
Desmond marking vector is highly transcriptionally ac- 
tive, and therefore represents an excellent site from 
25 which to express recombinant proteins, and 
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(2) that targeting by means of homologous recombi- 
nation can be accomplished using the subject vectors and 
occurs at a frequency high enough to make this system a 
viable and desirable alternative to random integration 
methods . 

To further demonstrate the efficacy of this system, 
we have also demonstrated that this site is airplif iable, 
resulting in even higher levels of gene expression and 
protein secretion. Amplification was achieved by plat- 
ing serial dilutions of 20F4 cells, starting at a densi- 
ty of 2.5 X 10* cells/ml, in 96 well tissue culture 
dishes, and culturing these cells in media (CHO-SSFMII) 
supplemented with 5, 10, 15 or 20nM methotrexate. Anti- 
body secreting clones were screened using standard ELISA 
techniques, and the highest producing clones were ex- 
panded and further analyzed. A summary of this amplifi- 
cation experiment is presented in Tcdjle 3 below. 
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Table 3: 
Simnnary of 20P4 An5)lif icatlon 



nM UTX 


# Wells 
Assayed 


Expression Level 
sig/l 96 well 


# Wells 
Expanded 


Expression Level 
pg/c/d from 
spinner 


10 


56 


3-13 


4 


10-15 


15 


27 


2-14 


3 


15-18 


20 


17 


4-11 


1 


ND 



Methotrexate amplification of 20F4 was set up as de- 
scribed in the text, using the concentrations of metho- 
trexate indicated in the above table. Supernatants 
from all surviving 96 well colonies were assayed by 
ELISA, and the range of anti-CD20 expressed by these 
clones is indicated in column 3. Based on these re- 
sults, the highest producing clones were expanded to 
120ml spinners and several ELISAs conducted on the 
spinner supernatants to determine the pg/cell/day ex- 
pression levels, reported in column 5. 



The data here clearly demonstrates that this site can be 
amplified in the presence of methotrexate. Clones from 
the 10 and 15nM amplifications were found to produce on 
the order of 15-20pg/cell/day. 

A 15nM clone, designated 20F4-15A5, was selected as 
the highest expressing cell line. This clone originated 
from a 96 well plate in which only 22 wells grew, and 
was therefore assumed to have arisen from a single cell. 
A 15nM clone, designated 20P4-15A5, was selected as the 
highest expriessing cell line. This clone originated • 
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from a 96 well plate in which only 22 wells grew, and 
was therefore assumed to have arisen from a single cell. 
The clone was then subjected to a further round of meth- 
otrexate amplification. As described above, serial 
dilutions of the culture were plated into 96 well dishes 
and cultured in CHO-SS-FMII medium supplemented with 
200, 300 or 400nM methotrexate. Surviving clones were 
screened by ELISA, and several high producing clones 
were expanded to spinner cultures and further analyzed. 
A summary of this second amplification experiment is 
presented in. Table 4. 

Table 4: 

Summary of 20P4-15A5 An5)lif ication 



2lM HTX 


# Wells 
Assayed 


Expression Level 
xng/l 96 well 


# Wells 
Expanded 


Expression Level 
pg/c/d, spinner 


200 


67 


23-70 


1 


50-60 


250 


86 


21-70 


4 


55-60 


300 


81 


15-75 


3 


40-50 



Methotrexate amplifications of 20F4-15A5 were set up 
and assayed as described in the text. The highest 
producing wells, the numbers of which are indicated in 
column 4, were expanded to 120ml spinner flasks. The 
expression levels of the cell lines derived from these 
wells is recorded as pg/c/d in column 5. 

The highest producing clone came from the 250nM metho- 
trexate amplification. The 250nM clone, 20F4-15A5-250A6 
originated from a 96 well plate in which only wells 
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grew, and therefore is assumed to have arisen from a 
single cell. Taken together, the data in Tables 3 and 4 
strongly indicates that two rounds of methotrexate am- 
plification are sufficient to reach expression levels of . 
5 60pg/cell/day, which is approaching the maximum secre- 
tion capacity of immunoglobulin in mammalian cells 
(Reff, M.E., Curr. Opin. Biotech,, 4:573-576 (1993)). 
The ability to reach this secretion capacity with just 
two amplification steps further enhances the utility of 

10 this homologous recombination system. Typically, random .. 
integration methods require more than two amplification 
steps to reach this expression level and are generally 
less reliable in terms of the ease of amplification. 
Thus, the homologous system offers a more efficient and 

15 time saving method of achieving high level gene expres- 
sion in mammalian cells. 

EXAMPLE 5 

Expression of Anti-Hiaman CD23 Antibody 
in Desmond Marked CHO Cells 

20 CD23 is low affinity IgE receptor which mediates 

binding of IgE to B and T lymphocytes (Sutton, B.J., and 
Gould, H.J., Nature, 366:421-42.8 (1993)). Anti-human 
CD23 monoclonal antibody 5E8 is a human gamma -1 mono- 
clonal antibody recently cloned and expressed in our 

25 laboratory. This antibody is disclosed in commonly 
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3 



assigned Serial No. 08/803,085, filed on February 20, 
1997. 

The heavy and light chain genes of 5E8 were cloned 
into the mammalian e3q>ression vector N5KG1, a derivative 
of the vector NEOSPLA (Bamett et al, in Antibody Ex- 
pression and Engineering, H.Y Yang and T. Imanaka, eds., 
PP27-40 (1995)) and two modifications were then made to 
the genes.. We have recently observed somewhat higher 
secretion of immunoglobulin light chains compared to 
heavy chains in other expression constructs in the labo- 
ratory (Reff et al, 1997, unpublished observations), in- 
an attempt to compensate for this deficit, we altered 
the 5E8 heavy chain gene by the addition of a stronger 
promoter/ enhancer element immediately upstream of the 
start site. In subsequent steps, a 2.9kb DNA fragment 
comprising the 5E8 modified light and heavy chain genes 
was isolated from the N5KG1 vector and inserted into the 
targeting vector Mandy. Preparation of 5E8 -containing 
Molly and electroporation into Desmond 15C9 CHO. cells 
was essentially as described in the preceding section. 

One modification to the previously described proto- 
col was in the type of culture medium used. Desmond 
■marked CHO cells were cultured in protein- free CD-CHO 
medium (Gibco-BRL, catalog # AS21206) supplemented with 
3mg/L recombinant insulin (3mg/mL stock, Gibco-BRL, 
catalog # AS22057) and 8mM L-glutamine (200mM stock, 
Gibco-BRL, catalog # 25030-081) . Subsequently, trans- 
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fected cells were selected in the above medium supple- 
mented with 400/zg/mL geneticin. In this experiment, 20 
electroporations were performed and plated into 96 well 
tissue culture dishes. Cells grew and secreted anti- 
CD23 in a total of 68 wells, all of which were assumed 
to be clones originating from a single G418 cell. 
Twelve of these wells were expanded to 120ml spinner 
flasks for further analysis. We believe the increased 
number of clones isolated in this experiment (68 com- 
pared with 10 for anti-CD20 as described in Example 4) v 
is due to a higher cloning efficiency and survival rate 
of cells grown in CD-CHO medium compared with CHO-SS- 
FMII medium. Expression levels for those clones ana- 
lyzed in spinner culture ranged from 0.5-3pg/c/d, in 
close agreement with the levels seen for the anti-CD20 
clones. The highest producing anti-CD23 clone, desig- 
nated 4H12, was subjected to methotrexate amplification 
in order to increase its expression levels. This ampli- 
fication was set up in a manner similar to that describ- 
ed for the anti-CD20 clone in Example 4. Serial dilu- 
tions of exponentially growing 4H12 cells were plated 
into 96 well tissue culture dishes and grown in CD-CHO 
medium supplemented with 3mg/L insulin, 8mM glutamine 
and 30, 35 or 40nM methotrexate. A summary of this 
amplification experiment is presented in Table 5. 



Table 5: 
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Summary of 2H12 Amplification 





# Wells 
Assayed 


£3^ression Level 
mg/l 96 well 


# Wells 
Expanded 


Expression Level 
Pg/c/d from 
spinner 


30 


100 


6-24 


8 


10-25 


35 


64 


4-27 


2 


10-15 


40 


96 


4-20 


1 


ND 



The highest expressing clone obtained was a 30nM clone 
isolated from a plate on which 22 wells had grown. 
This clone, designated 4H12-30G5, was reproducibly 
secreting 18-22pg antibody per cell per day. This is 
the same range of ejqjression seen for the first ampli- 
fication of the anti CD20 clone 20F4 (clone 20F4-15A5 
which produced 15-18pg/c/d, as described in Example 4) 
This data serves to further support the observation 
that amplification at this marked site in CHO is repro- 
ducible and efficient. A second amplification of this 
30nM cell line is currently underway, it is antici- 
pated that saturation levels of expression will be 
achievable for the anti-CD23 antibody in just two am- 
plification steps, as was the case for anti-CD20. 

EXAMPLE fi 

Pxpreggipn of Immimoadhesi n in n^^im ond Ma»-lr«>d CHQ r«.n « 
CTLA-4, a member of the Ig superfamily, is fotmd on 
the surface of T lymphocytes and is thought to play a 
role in antigen- specific T-cell activation (Dariavach et 
al, Eur. J. Inmunol,, 18:1901-1905 (1988); and Linsley 
et al, Exp. Med., 174:561-569 (1991)). In order to 
further study the precise role of the CTLA-4 molecule in 
the activation pathway, a soluble fusion protein com- 
prising the extracellular domain of CTLA-4 linked to a 
truncated form of the human IgGl constant region was 
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created (Linsley et al (M^) . We have recently 
expressed this CTLA-4 Ig fusion protein in the mammalian 
expression vector BLECHl, a derivative of the plasmid 
NEOSPIA (Bamett et al, in Antibody Expression and Engi- 
neering, H.Y Yang and T. Imanaka, eds., pp27-40 (1995)). 
An 800bp fragment encoding the CTLA-4 Ig was isolated 
from this vector and inserted between the Sad I and 
Bglll sites in Molly. 

Preparation of CTLA-4Ig-Moliy and electrop6ration 
into Desmond clone 15C9 CHO cells was performed as de- 
scribed in the previous example relating to anti-CD20. 
Twenty electroporations were, carried out, and plated 
into 96 well culture dishes as described previously. 
Eighteen CTLA-4 expressing wislls were isolated from the 
96 well plates and carried forward to the 120ml spinner 
stage. Southern analyses on genomic DNA isolated from 
each of these clones were then carried out to determine 
how many of the homologous clones contained additional 
random integrants. Genomic DNA was digested with Bglii 
and probed with a PGR generated digoxygenin labelled 
probe to the human IgGl constant region. The results of 
this analysis indicated that 85% of the CTLA-4 clones 
are homologous integrants only; the remaining 15% con- 
tained one additional random integrant. This result 
corroborates the findings from the expression of anti- 
CD20 discussed above, where 80% of the clones were sin- 
gle homologous integrants. Therefore, we can conclude 



wo 98/41645 



PCT/US98/03935 




- 47 - 



that this expression system reproducibly yields single 
targeted homologous integrants in at least 80% of all 
clones produced. 

Expression levels for the homologous CTlA4-Ig 
clones ranged from 8 -12pg/cell/day. This is somewhat 
higher than the range reported for anti-CD20 antibody 
and anti-CD23 antibody clones discussed above. However, 
we have previously observed that expression of this 
molecule. using the intronic insertion vector system also 
resulted in significantly higher expression levels than 
are obtained for immunoglobulins. We are currently 
unable to provide an explanation for this observation. 

EXAMPLE 7 

Tarcreting Anti-CD2Q to an alternate Desmond Marked crtf) 
Cell Line 

As we described in a preceding section, we obtained 
5 single copy Desmond marked CHO cell lines (see Figures 
4 and 5) . In order to demonstrate that the success of 
our targeting strategy is not due to some unique proper- 
ty of Desmond clone 15C9 and limited only to this clone, 
we introduced anti-CD20 Molly into Desmond clone 9B2 
(lane 6 in fig\ire 4, lane 1 in figure 5). Preparation 
of Molly DNA and electroporation into Desmond 9B2 was 
exactly as described in the previous example pertaining 
to anti-CD20. We obtained one homologous integrant from 
this experiment. This clone was expanded to a 120ml 
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spinner flask, where it produced on average 1.2pg anti- 
CD20/cell/day. This is considerably lower expression 
than we observed with Molly targeted into Desmond 15C9. 
However, this was the anticipated result, based on our 
northern analysis of the Desmond clones . As can be seen 
in Figure 5, mRNA levels from clone 9B2 are considerably 
lower than those from 15C9, indicating the site in this 
clone is not as transcriptionally active as that in 
.15C9. Therefore, this exper;iment not only demonstrates 
the reproducibility of the system - presumably any 
marked Desmond , site can be targeted with Molly - it also 
confirms the northern data that the site in Desmond 15C9 
is the most transcriptionally active. 

From the foregoing, it will be appreciated that, 
although specific embodiments of the invention have been 
described herein for purposes of illustration, various 
modifications may be made without diverting from the 
scope of the invention. Accordingly, the invention is 
not limited by the appended claims. 
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WHAT IS CLAIMED TS- 

1. A method for inserting a desired DNA at a 
target site in the genome of a mammalian cell which 
comprises the following steps: 

(i) transf acting or transforming a mammalian cell 
with a first plasmid ("marker plasmid") containing the 
following sequences: 

(a) a region of DNA that is heterologous to 
the mammalian cell genome which when integrated in the 
mammalian cell genpme provides a unique site for homolo- ^ 
gous recombination; 

(b) a DNA fragment encoding a portion of a 
first selectable marker protein; and 

(c) at least one other selectable marker DNA 
that provides for selection of mammalian cells which 
have been successfully integrated with the marker plas- 
mid; 

(ii) selecting a cell which contain the marker 
plasmid integrated in its genome; 

(iii) transfecting or transforming said selected 
cell with a second plasmid ("target plasmid") which 
contains the following sequences: 

(a) a region of DNA that is identical or is 
sufficiently homologous to the unique region in the 
marker plasmid such that this region of DNA can recom- 
bine with said DNA via homologous recombination; 
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(b) a DNA fragment encoding a portion of the 



same selectable marker contained in the marker plasmid, 
wherein the active selectable marker protein encoded by 
said DNA is only produced if said fragment is expressed 
5 in association with the fragment of said selectable 
, marker DNA contained in the marker plasmid; and 

(iv) selecting cells which, contain the target plas- 
mid integrated at the target site by screening for the 
expression of the first selectable marker protein. . 

10 2. The method of Claim 1, wherein the DNA frag- 

ment encoding a fragment of a first selectable marker is 
an exon of a dominant selectable marker. 

3. The method of Claim 2, wherein the second 
plasmid contains the remaining exons of said first 

15 selectable marker. 

4. The method of Claim 3, wherein at least one 
DNA encoding a desired protein is inserted between said 
exons of said first selectable marker contained in the 
target plasmid. 



20 



5. The method Claim 4, wherein a DNA encoding a 
dominant selectable marker is further inserted between 
the exons of said first selectable marker contained in 
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the target plasmid to provide for co-amplification of 
the DNA encoding the desired protein. 

6. The method of Claim 3, wherein the first domi- 
nant selectable marker is selected from the group con- 
5 sisting of neomycin phosphotransferase, histidinol dehy- 
. drogenase, dihydrofolate reductase, hygromycin phospho- . 
transferase, herpes simplex virus thymidine kinase, 
adenosine deaminase, glutamine synthetase, and 
hypoxanthine -guanine phosphoribosyl transferase. 

.10 7. The method of Claim 4, wherein the desired 

protein is a mammalian protein. 

8. The method of Claim 7, wherein the protein is 
an immunoglobulin. 

9. The method of Claim 1, which further comprises 
15 determining the RNA levels of the selectable maarker (c) 

contained in the marker plasmid prior to integration of 
the target vector. 

10. The method of Claim 9, wherein the other 
selectable marker contained in the marker plasmid is a 

20 dominant selectable marker selected from the group con- 
sisting of histidinol dehydrogenase, herpes simplex 
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thymidine kinase, hydromycin phosphotransferase, adeno- 
sine deaminase and glutamine synthetase. 

11- The method of Claim 1, wherein the mammalian 
cell is selected from the group consisting of Chinese 
5 hamster ovary (CHO) cells, myeloma cells, baby hamster 
kidney cells, COS cells, NSO cells, HeLa cells and NIH 
3T3 cells. 

12. The method of Claim 11, wherein the cell is a 
CHO cell. 

10 13. The method of Claim 1, wherein the marker 

plasmid contains the third exon of the neomycin phospho- 
transferase gene and the target plasmid contains the 
first two exons of the neomycin phosphotransferase gene. 

14. The method of Claim 1, wherein the marker 
15 plasmid further contains a rare restriction endonuclease 
sequence which is inserted within the region of homolo- 
gy. 



20 



15. The method of Claim 1, wherein the unique 
region of DNA that provides for homologous recombination 
is a bacterial DNA, a viral DNA or a synthetic DNA. 
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16. The method of Claim 1, wherein the unique 
region of DNA that provides for homologous recombination 
is at least 300 nucleotides. 

17. The method of Claim 16, wherein the unique 
region of DNA ranges in size from about 300 nucleotides 
to 20 kilobases. 

18. The method of claim 11, wherein the unique 
region of DNA preferably ranges in size from 2 to 10 
kilobases. 

19. The method of Claim 1, wherein the first 
selectable marker DNA is split into at least three 
exons . 

20. The method of Claim 1, wherein the unique 
region of DNA that provides for homologous, recombination 
is a bacterial DNA, an insect DNA, a viral DNA or a 
synthetic DNA. 

21. The method of Claim 20, wherein the unique 
region of DNA does 'not contain any functional genes. 

22. A vector system for inserting a desired DNA at 
a target site in the genome of a mammalian cell which 
comprises at least the following: 
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(i) a first plasmid ("marker plasmid") containing 
at least the following sequences: 

(a) a region of DNA that is heterologous to 
the mammalian cell genome which when integrated in the 
mammalian cell genome provides a unique site for homolo^ 
gous recombination; 

(b) a DNA fragment encoding a portion of a 
first selectable marker protein; and 

(c) at least one other selectable marker DNA 
that provides for selection of mammalian cells which 
have been successfully integrated with the marker plas- 
mid; and 

(ii) a second plasmid ("target plasmid") which con- 
tains at least the following sequences: 

(a) a region of DNA that is identical or is 
sufficiently homologous to the unique region in the 
marker plasmid such that this region of DNA can recom- 
bine with said DNA via homologous recombination; 

(b) a DNA fragment encoding a portion of the 
same selectable marker contained in the marker plasmid, 
wherein the active selectable marker protein encoded by 
said DNA is only produced if said fragment is expressed 
in association with the fragment of said selectable 
marker DNA contained in the marker plasmid. 
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23. The vector system of Claim 22, wherein the DNA 
fragment encoding a fragment of a first selectable mark- 
er is an exon of a dominant selectable marker. 

.24. The vector system of Claim 23, wherein the 
second plasmid contains the remaining exons of said 
first selectable marker. 

25. The vector system of Claim 24, wherein at 
least one DNA encoding a desired protein is inserted 
between said exons of said first selectable marker con- 
tained in the target plasmid. 

26. The vector system of Claim 24, wherein a DNA 
encoding a dominant selectable marker is further insert- 
ed between the exons of said first selectable marker 
contained in the target plasmid to provide for co- ampli- 
fication of the DNA encoding the desired protein. 

27. The vector system of Claim 24, wherein the 
first dominant selectable marker is selected from the 
group consisting of neomycin phosphotransferase, 
histidinol dehydrogenase, dihydrofolate reductase, 
hygromycin phosphotransferase, herpes simplex virus 
thymidine kinase, adenosine deaminase, glutaraine synthe- 
tase, and hypoxanthine -guanine phosphor ibosyl transfer- 
ase . 
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28- The vector system of Claim 25, wherein the 
desired protein is a mammalian protein • 



29. The vector system of Claim 28, wherein the 



protein is an immiinoglobulin. 



5 



30. 



The vector system of Claim 22, wherein the 



other selectable marker contained in the marker plasmid 
is a dominant selectable marker selected from the group 
consisting of histidinol dehydrogenase, herpes simplex 
thymidine kinase, hydromycin phosphotransferase, adeno- 
10 sine deaminase and glutamine synthetase. 

31. The vector system of Claim 22, which provides 
for insertion of a desired DNA at a targeted site in the 
genome of a mammalian cell selected from the group con- 
sisting of Chinese hamster ovary (CHO) cells, myeloma 

15 cells, baby hamster kidney cells, COS cells, NSO cells, 
HeLa cells and NIH 3T3 cells. 

32. The vector system of Claim 31, wherein the 
mammalian cell is a CHO cell. 

33. The vector system of Claim 22, wherein the 
20 marker plasmid contains the third exon of the neomycin 

phosphotransferase gene and the target plasmid contains 
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the first two exons of the neomycin phosphotransferase 
gene. 

34. The vector system of Claim 22, wherein the 
marker plasmid further contains a rare restriction endo- 
nuclease sequence which is inserted within the region of 
homology. 

35. The vector system of Claim 22, wherein the 
unique region of DNA that provides for homologous recom- 
bination is a bacterial DNA, a viral DNA or a synthetic 
DNA. 

36. The vector system of Claim 22, wherein the 
unique region of DNA (a) contained in the marker plasmid 
vector system that provides for homologous recombination 
is at least 3 00 nucleotides. 

37. The vector system of Claim 36, wherein the 
unique region of DNA ranges in size from about 300 
nucleotides to 20 kilobases. 



38. The vector system of Claim 37, wherein the 
xinique region of DNA preferably ranges in size from 2 to 
10 kilobases. 
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39. The vector system of Claim 22, wherein the 
first selectable marker DNA is split into at least three 
exons . 

40. The vector system of Claim 22, wherein the 
unique region of DNA that provides for homologous, recom- 
bination is a bacterial DNA, an insect DNA, a viral DNA 
or a synthetic DNA- 

41. The vector system of Claim 40, wherein the 
unique region of DNA does not contain any functional 
genes. 
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N2 + Neomycin Phosphotransferase Exon 2 

VL = Anti-CD20 Light chain leader + Variable 

K = Human Kappa Constant 
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G1 = Human Gamma 1 Constant 
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FIG. 9 



Pac I 



(5) /Pme 1(726) 

Swa 1(1020) 

Nru 1(1986) 
PshA 1(3130) 

(4496) 




Xba I 



'(13019) 
1(12386) 
'(11589) 
Ss68387 1(11229) 
BamH I(10836) 
Sf^a '(10564) 



(6133) 



BsiW L 



Sfi 1(6631) 

Bgl "(7529) 
'(7704) 



EcoR I 



(8250) 



Apa '(9858) '(9426) 

"(10365) '^'^e 1(9843) 

Nt D = Inactive Dihydrofolate reductase 

E = CMV and SV40 enhancers 

Nt H = Inactive Samonella Histidinol Dehydrogenase 

T = Herpes Simplex thymidine kinas promoter and polyoma enhancer 

C = Cytomegalovirus promoter/enhancer 

INI - Neomycin phosphotransferase exon 1 

<K = Human kappa constant 

VL = Variable light chain anti-CD23 primate 5E8 and leader 
VH = Variable heavy chain anti-CD-23 primate 5E8N- and leader 
B = Bovine growth hormone polyadenylation 
M2 = Neomycin phosphotransferase exon 2 
G1 = Human Gamma 1 constant 

Mandy cut Xbal Xho I and ligated to Xba I Xho I fragment 
from XKG1+CD23 5E8N-SHL 

Map by Mitchell Reff Constructed by Karen McLachlan 06/26/97 19,035 bp 
Noncutters = Aflll, Avrll. Hindlll, l-Ppol, l-Scel. Pmll, RsrII, Sgfl. Srfl 
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